Capstone¶
Student Information
- 1. Student_ID – Unique identifier assigned to each student.
- 2. Name – Full name of the student.
- 3. Gender – Gender of the student (Male/Female/Other).
- 4. Age – Age of the student in years.
- 5. Education_Level – Highest education level completed by the student.
- 6. Employment_Status – Employment status of the student.
- 7. City – City where the student resides.
- 8. Device_Type – Type of device used by the student.
- 9. Internet_Connection_Quality – Quality of the student’s internet connection.
Course Information
- 10. Course_ID – Unique identifier of the course.
- 11. Course_Name – Name of the course enrolled.
- 12. Category – Category/subject area of the course.
- 13. Course_Level – Difficulty level of the course.
- 14. Course_Duration_Days – Duration of the course in days.
- 15. Instructor_Rating – Rating of the course instructor.
Student Engagement Metrics
- 16. Login_Frequency – Number of login sessions by the student.
- 17. Average_Session_Duration_Min – Average time (in minutes) spent per session.
- 18. Video_Completion_Rate – Percentage of course videos completed.
- 19. Discussion_Participation – Participation level in course discussion forums.
- 20. Time_Spent_Hours – Total hours spent on the course/content.
- 21. Days_Since_Last_Login – Days since the last login.
- 22. Notifications_Checked – Number of notifications viewed or clicked.
- 23. Peer_Interaction_Score – Score indicating peer interactions.
- 24. Assignments_Submitted – Number of assignments submitted.
- 25. Assignments_Missed – Number of assignments missed.
- 26. Quiz_Attempts – Number of quiz attempts made.
- 27. Quiz_Score_Avg – Average quiz score.
- 28. Project_Grade – Grade of the final project.
- 29. Progress_Percentage – Percentage of course completion.
- 30. Rewatch_Count – Number of times lessons/videos were replayed.
Enrollment & Payment Details
- 31. Enrollment_Date – Date when the student enrolled.
- 32. Payment_Mode – Mode of payment used.
- 33. Fee_Paid – Total fee paid.
- 34. Discount_Used – Discount amount applied.
- 35. Payment_Amount – Final amount paid after discount.
App & Support Interactions
- 36. App_Usage_Percentage – Percentage of usage via mobile app.
- 37. Reminder_Emails_Clicked – Number of reminder emails opened or clicked.
- 38. Support_Tickets_Raised – Number of support tickets raised.
- 39. Satisfaction_Rating – Overall student satisfaction score.
Target Variable
- 40. Completed (Target) – Indicates whether the student completed the course (Yes/No or 1/0).
| Student_ID | Name | Gender | Age | Education_Level | Employment_Status | City | Device_Type | Internet_Connection_Quality | Course_ID | Course_Name | Category | Course_Level | Course_Duration_Days | Instructor_Rating | Login_Frequency | Average_Session_Duration_Min | Video_Completion_Rate | Discussion_Participation | Time_Spent_Hours | Days_Since_Last_Login | Notifications_Checked | Peer_Interaction_Score | Assignments_Submitted | Assignments_Missed | Quiz_Attempts | Quiz_Score_Avg | Project_Grade | Progress_Percentage | Rewatch_Count | Enrollment_Date | Payment_Mode | Fee_Paid | Discount_Used | Payment_Amount | App_Usage_Percentage | Reminder_Emails_Clicked | Support_Tickets_Raised | Satisfaction_Rating | Completed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | STU100000 | Vihaan Patel | Male | 19.0 | Diploma | Student | Indore | Laptop | Medium | C102 | Data Analysis with Python | Programming | Intermediate | 60 | 4.7 | 3.0 | 30 | 55.0 | 2.0 | 0.5 | 1 | 6 | 4.3 | 8.0 | 1 | 5 | 80.9 | 71.2 | 70.8 | 0.0 | 01-06-2024 | Scholarship | No | No | 1740 | 49.0 | 3 | 4 | 3.5 | Completed |
| 1 | STU100001 | Arjun Nair | Female | 17.0 | Bachelor | Student | Delhi | Laptop | Low | C106 | Machine Learning A-Z | Programming | Advanced | 90 | 4.6 | 4.0 | 37 | 84.1 | 2.0 | 0.9 | 3 | 5 | 7.8 | 4.0 | 6 | 3 | 78.4 | 42.5 | 55.6 | 2.0 | 27-04-2025 | Credit Card | Yes | No | 6147 | 86.0 | 0 | 0 | 4.5 | Not Completed |
| 2 | STU100002 | Aditya Bhardwaj | Female | 34.0 | Master | Student | Chennai | Mobile | Medium | C101 | Python Basics | Programming | Beginner | 45 | 4.6 | 5.0 | 9 | 75.6 | 3.0 | 0.5 | 19 | 5 | 6.7 | 8.0 | 2 | 3 | 100.0 | 87.9 | 78.8 | 2.0 | 20-01-2024 | NetBanking | Yes | No | 4280 | 85.0 | 1 | 0 | 5.0 | Completed |
| 3 | STU100003 | Krishna Singh | Female | 29.0 | Diploma | Employed | Surat | Mobile | High | C105 | UI/UX Design Fundamentals | Design | Beginner | 40 | 4.4 | 2.0 | 27 | 63.3 | 1.0 | 7.4 | 19 | 9 | 6.4 | 0.0 | 10 | 4 | 59.1 | 51.4 | 24.7 | 4.0 | 13-05-2025 | UPI | Yes | No | 3812 | 42.0 | 2 | 3 | 3.8 | Completed |
| 4 | STU100004 | Krishna Nair | Female | 19.0 | Master | Self-Employed | Lucknow | Laptop | Medium | C106 | Machine Learning A-Z | Programming | Advanced | 90 | 4.6 | 2.0 | 36 | 86.4 | 1.0 | 0.5 | 4 | 7 | 7.5 | 5.0 | 5 | 8 | 84.8 | 93.0 | 64.9 | 4.0 | 19-12-2024 | Debit Card | Yes | Yes | 5486 | 91.0 | 3 | 0 | 4.0 | Completed |
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100000 entries, 0 to 99999 Data columns (total 40 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Student_ID 100000 non-null object 1 Name 100000 non-null object 2 Gender 100000 non-null object 3 Age 95226 non-null float64 4 Education_Level 100000 non-null object 5 Employment_Status 100000 non-null object 6 City 100000 non-null object 7 Device_Type 100000 non-null object 8 Internet_Connection_Quality 100000 non-null object 9 Course_ID 100000 non-null object 10 Course_Name 100000 non-null object 11 Category 100000 non-null object 12 Course_Level 100000 non-null object 13 Course_Duration_Days 100000 non-null int64 14 Instructor_Rating 100000 non-null float64 15 Login_Frequency 99888 non-null float64 16 Average_Session_Duration_Min 100000 non-null int64 17 Video_Completion_Rate 100000 non-null float64 18 Discussion_Participation 99045 non-null float64 19 Time_Spent_Hours 100000 non-null float64 20 Days_Since_Last_Login 100000 non-null int64 21 Notifications_Checked 100000 non-null int64 22 Peer_Interaction_Score 100000 non-null float64 23 Assignments_Submitted 99048 non-null float64 24 Assignments_Missed 100000 non-null int64 25 Quiz_Attempts 100000 non-null int64 26 Quiz_Score_Avg 100000 non-null float64 27 Project_Grade 100000 non-null float64 28 Progress_Percentage 99665 non-null float64 29 Rewatch_Count 97524 non-null float64 30 Enrollment_Date 100000 non-null object 31 Payment_Mode 100000 non-null object 32 Fee_Paid 100000 non-null object 33 Discount_Used 100000 non-null object 34 Payment_Amount 100000 non-null int64 35 App_Usage_Percentage 99974 non-null float64 36 Reminder_Emails_Clicked 100000 non-null int64 37 Support_Tickets_Raised 100000 non-null int64 38 Satisfaction_Rating 98918 non-null float64 39 Completed 100000 non-null object dtypes: float64(14), int64(9), object(17) memory usage: 30.5+ MB
| Student_ID | Name | Gender | Age | Education_Level | Employment_Status | City | Device_Type | Internet_Connection_Quality | Course_ID | Course_Name | Category | Course_Level | Course_Duration_Days | Instructor_Rating | Login_Frequency | Average_Session_Duration_Min | Video_Completion_Rate | Discussion_Participation | Time_Spent_Hours | Days_Since_Last_Login | Notifications_Checked | Peer_Interaction_Score | Assignments_Submitted | Assignments_Missed | Quiz_Attempts | Quiz_Score_Avg | Project_Grade | Progress_Percentage | Rewatch_Count | Enrollment_Date | Payment_Mode | Fee_Paid | Discount_Used | Payment_Amount | App_Usage_Percentage | Reminder_Emails_Clicked | Support_Tickets_Raised | Satisfaction_Rating | Completed |
|---|
- No duplicates Present
Student_ID ['STU100000' 'STU100001' 'STU100002' ... 'STU199997' 'STU199998' 'STU199999'] Number of unique values in Student_ID are:100000 ---------------------------------------------------------------------------------- Name ['Vihaan Patel' 'Arjun Nair' 'Aditya Bhardwaj' 'Krishna Singh' 'Krishna Nair' 'Rohan Reddy' 'Sai Nair' 'Krishna Desai' 'Vihaan Joshi' 'Vivaan Nair' 'Aditya Gupta' 'Sneha Bhardwaj' 'Rohan Desai' 'Vihaan Mehta' 'Sai Joshi' 'Sai Reddy' 'Rahul Singh' 'Rohan Shah' 'Neha Sharma' 'Pooja Singh' 'Kavya Verma' 'Rahul Kumar' 'Priya Joshi' 'Ritika Reddy' 'Arjun Iyer' 'Sakshi Sharma' 'Ritika Kumar' 'Vihaan Iyer' 'Pooja Sharma' 'Aarav Nair' 'Sai Gupta' 'Ananya Shah' 'Arjun Joshi' 'Aarav Bhardwaj' 'Sneha Bose' 'Vivaan Singh' 'Aditya Nair' 'Arjun Verma' 'Sai Kumar' 'Krishna Bose' 'Sneha Reddy' 'Rohan Singh' 'Krishna Gupta' 'Nikhil Reddy' 'Krishna Sharma' 'Isha Mehta' 'Isha Verma' 'Kavya Desai' 'Vivaan Iyer' 'Rohan Verma' 'Priya Bose' 'Priya Kumar' 'Ananya Mehta' 'Meera Verma' 'Nikhil Nair' 'Vivaan Patel' 'Sai Patel' 'Priya Sharma' 'Sakshi Patel' 'Nikhil Bhardwaj' 'Krishna Iyer' 'Priya Verma' 'Pooja Bose' 'Kavya Bhardwaj' 'Aditya Bose' 'Kavya Shah' 'Ritika Patel' 'Vihaan Bose' 'Rahul Verma' 'Aditya Iyer' 'Aarav Joshi' 'Rahul Patel' 'Arjun Gupta' 'Sneha Singh' 'Sneha Desai' 'Sakshi Nair' 'Sai Desai' 'Aarav Verma' 'Rohan Kumar' 'Priya Shah' 'Krishna Shah' 'Pooja Kumar' 'Sneha Iyer' 'Pooja Patel' 'Sai Bhardwaj' 'Isha Joshi' 'Sakshi Gupta' 'Krishna Kumar' 'Ritika Bose' 'Krishna Joshi' 'Pooja Bhardwaj' 'Ananya Gupta' 'Priya Nair' 'Isha Iyer' 'Sakshi Bhardwaj' 'Ananya Kumar' 'Sakshi Iyer' 'Aditya Kumar' 'Ananya Singh' 'Kavya Mehta' 'Priya Reddy' 'Aarav Bose' 'Isha Singh' 'Rahul Bhardwaj' 'Sakshi Shah' 'Rahul Reddy' 'Kavya Kumar' 'Neha Desai' 'Aditya Verma' 'Pooja Gupta' 'Aarav Shah' 'Meera Patel' 'Isha Gupta' 'Aarav Mehta' 'Sai Shah' 'Aarav Sharma' 'Kavya Joshi' 'Rohan Iyer' 'Vivaan Desai' 'Vihaan Bhardwaj' 'Sneha Verma' 'Nikhil Iyer' 'Rohan Sharma' 'Sakshi Desai' 'Vivaan Joshi' 'Meera Sharma' 'Kavya Nair' 'Neha Bhardwaj' 'Pooja Shah' 'Neha Bose' 'Meera Shah' 'Neha Gupta' 'Aditya Singh' 'Rohan Nair' 'Rahul Joshi' 'Sai Singh' 'Nikhil Desai' 'Vivaan Gupta' 'Meera Reddy' 'Neha Shah' 'Vivaan Verma' 'Isha Kumar' 'Vihaan Desai' 'Meera Kumar' 'Kavya Sharma' 'Vihaan Gupta' 'Arjun Sharma' 'Sakshi Kumar' 'Meera Gupta' 'Aditya Reddy' 'Vihaan Shah' 'Vivaan Reddy' 'Kavya Bose' 'Vihaan Reddy' 'Meera Bose' 'Isha Shah' 'Kavya Gupta' 'Ritika Nair' 'Rahul Mehta' 'Rohan Bhardwaj' 'Priya Iyer' 'Meera Desai' 'Sneha Mehta' 'Kavya Iyer' 'Nikhil Kumar' 'Neha Reddy' 'Vivaan Bose' 'Krishna Bhardwaj' 'Vivaan Kumar' 'Sneha Nair' 'Pooja Nair' 'Rahul Bose' 'Ananya Nair' 'Priya Patel' 'Vihaan Verma' 'Arjun Mehta' 'Kavya Reddy' 'Sneha Gupta' 'Vivaan Mehta' 'Arjun Desai' 'Pooja Desai' 'Aarav Kumar' 'Nikhil Joshi' 'Aditya Joshi' 'Priya Bhardwaj' 'Sakshi Mehta' 'Rahul Nair' 'Rohan Mehta' 'Isha Nair' 'Neha Joshi' 'Aditya Sharma' 'Vihaan Nair' 'Pooja Iyer' 'Krishna Verma' 'Sneha Shah' 'Nikhil Singh' 'Sakshi Bose' 'Pooja Joshi' 'Isha Reddy' 'Sakshi Reddy' 'Ananya Joshi' 'Sai Verma' 'Ritika Gupta' 'Aditya Mehta' 'Aarav Reddy' 'Nikhil Mehta' 'Sneha Sharma' 'Isha Bhardwaj' 'Arjun Singh' 'Sakshi Joshi' 'Arjun Reddy' 'Ananya Desai' 'Ritika Sharma' 'Nikhil Gupta' 'Sneha Joshi' 'Arjun Bose' 'Sneha Patel' 'Ritika Desai' 'Arjun Bhardwaj' 'Rohan Gupta' 'Ritika Mehta' 'Arjun Shah' 'Vihaan Kumar' 'Meera Nair' 'Neha Singh' 'Ananya Sharma' 'Rahul Gupta' 'Arjun Patel' 'Meera Iyer' 'Kavya Patel' 'Pooja Verma' 'Sakshi Verma' 'Ritika Joshi' 'Neha Nair' 'Isha Desai' 'Nikhil Verma' 'Ritika Verma' 'Rohan Joshi' 'Ritika Iyer' 'Aditya Patel' 'Ananya Reddy' 'Rahul Sharma' 'Meera Bhardwaj' 'Priya Gupta' 'Pooja Mehta' 'Pooja Reddy' 'Ananya Patel' 'Neha Verma' 'Aarav Gupta' 'Sai Mehta' 'Priya Desai' 'Neha Iyer' 'Sakshi Singh' 'Ritika Singh' 'Aarav Desai' 'Aarav Singh' 'Meera Singh' 'Vivaan Shah' 'Ananya Bose' 'Rahul Iyer' 'Vivaan Bhardwaj' 'Sai Sharma' 'Ananya Iyer' 'Vivaan Sharma' 'Nikhil Shah' 'Sai Bose' 'Aditya Desai' 'Krishna Reddy' 'Nikhil Sharma' 'Vihaan Singh' 'Meera Mehta' 'Rahul Shah' 'Ananya Bhardwaj' 'Ananya Verma' 'Vihaan Sharma' 'Krishna Patel' 'Rahul Desai' 'Isha Patel' 'Nikhil Patel' 'Ritika Bhardwaj' 'Neha Mehta' 'Meera Joshi' 'Isha Bose' 'Priya Singh' 'Rohan Bose' 'Ritika Shah' 'Aarav Iyer' 'Krishna Mehta' 'Rohan Patel' 'Priya Mehta' 'Sai Iyer' 'Aarav Patel' 'Arjun Kumar' 'Sneha Kumar' 'Neha Kumar' 'Aditya Shah' 'Kavya Singh' 'Nikhil Bose' 'Neha Patel' 'Isha Sharma'] Number of unique values in Name are:300 ---------------------------------------------------------------------------------- Gender ['Male' 'Female' 'Other'] Number of unique values in Gender are:3 ---------------------------------------------------------------------------------- Age [19. 17. 34. 29. 21. 22. 23. nan 31. 28. 24. 26. 27. 25. 35. 18. 33. 20. 30. 36. 41. 38. 39. 43. 40. 42. 44. 46. 45. 48. 49. 52.] Number of unique values in Age are:32 ---------------------------------------------------------------------------------- Education_Level ['Diploma' 'Bachelor' 'Master' 'HighSchool' 'PhD'] Number of unique values in Education_Level are:5 ---------------------------------------------------------------------------------- Employment_Status ['Student' 'Employed' 'Self-Employed' 'Unemployed'] Number of unique values in Employment_Status are:4 ---------------------------------------------------------------------------------- City ['Indore' 'Delhi' 'Chennai' 'Surat' 'Lucknow' 'Jaipur' 'Hyderabad' 'Nagpur' 'Kolkata' 'Ahmedabad' 'Pune' 'Mumbai' 'Bengaluru' 'Bhopal' 'Vadodara'] Number of unique values in City are:15 ---------------------------------------------------------------------------------- Device_Type ['Laptop' 'Mobile' 'Tablet'] Number of unique values in Device_Type are:3 ---------------------------------------------------------------------------------- Internet_Connection_Quality ['Medium' 'Low' 'High'] Number of unique values in Internet_Connection_Quality are:3 ---------------------------------------------------------------------------------- Course_ID ['C102' 'C106' 'C101' 'C105' 'C103' 'C104' 'C107' 'C108'] Number of unique values in Course_ID are:8 ---------------------------------------------------------------------------------- Course_Name ['Data Analysis with Python' 'Machine Learning A-Z' 'Python Basics' 'UI/UX Design Fundamentals' 'Introduction to AI' 'Digital Marketing Essentials' 'Statistics for Data Science' 'Excel for Business'] Number of unique values in Course_Name are:8 ---------------------------------------------------------------------------------- Category ['Programming' 'Design' 'Marketing' 'Math' 'Business'] Number of unique values in Category are:5 ---------------------------------------------------------------------------------- Course_Level ['Intermediate' 'Advanced' 'Beginner'] Number of unique values in Course_Level are:3 ---------------------------------------------------------------------------------- Course_Duration_Days [60 90 45 40 75 30 50 25] Number of unique values in Course_Duration_Days are:8 ---------------------------------------------------------------------------------- Instructor_Rating [4.7 4.6 4.4 4.5 4.3 4.2 4.1] Number of unique values in Instructor_Rating are:7 ---------------------------------------------------------------------------------- Login_Frequency [ 3. 4. 5. 2. 7. 8. 6. 9. 1. 10. 11. nan 0. 14. 15.] Number of unique values in Login_Frequency are:15 ---------------------------------------------------------------------------------- Average_Session_Duration_Min [30 37 9 27 36 43 12 20 38 28 21 23 34 44 58 42 32 35 39 33 26 13 24 53 49 50 25 17 40 62 51 16 31 46 41 45 29 48 8 19 22 14 5 18 47 52 61 59 10 56 57 11 63 54 15 6 55 65 66 7 60 64 74 67 72 68 76 70 71 69 75 73 81] Number of unique values in Average_Session_Duration_Min are:73 ---------------------------------------------------------------------------------- Video_Completion_Rate [55. 84.1 75.6 63.3 86.4 85.9 94.5 79.2 82.3 63.4 82.9 64.3 53.5 88.2 73.8 92.4 83.5 55.6 89.5 48.4 66.1 86.3 47.2 49.1 49.7 87.1 22.2 57.3 53.3 23.6 95.4 60.2 55.2 70.4 51.8 64.4 57.2 69.8 63.1 64.2 24.2 65.6 72.2 37. 61.6 48.1 29.3 54.7 40.7 80.4 85.6 47.8 61.9 85.5 71.9 88.3 77.7 62. 76. 56. 38.3 57.1 71.2 56.7 94.2 84.8 75.4 5.6 51.2 28.6 33.2 49. 81.3 83.1 49.9 96.4 98.1 44. 64. 66.4 77.6 39.1 54.6 93.6 52.9 95.2 66.7 38.7 68. 85. 50.4 51.5 64.7 81.6 44.4 32.4 91.4 46.9 40.6 43.9 86.9 49.4 85.1 24.5 38. 71.8 58.6 44.2 86.8 29.9 92. 90.2 57.7 71.3 36.2 81.4 24.4 38.1 22.6 46.1 78.5 96.5 30.5 43.2 67.7 81.8 80.8 81.5 62.1 84.7 61.5 97.7 16. 23.2 12.8 69.3 46.5 78. 33.5 72.5 63.2 28.9 52.5 33.1 53.9 84.2 37.9 73.6 44.9 33.3 68.7 67.9 75.9 85.8 60. 37.3 15.3 27.1 39.6 27.7 32.2 73. 47. 38.5 35.1 83.8 78.9 77.2 76.3 94.4 66. 89.2 55.8 88.8 59.9 39.2 57.4 89.8 44.5 74.5 85.7 71.4 70.8 67.1 64.5 52.1 52. 59.4 51.3 72. 76.8 80.5 74.6 70.7 83.4 82.5 66.8 59.2 61.1 46.8 56.8 36. 92.8 47.4 38.6 25.6 61.7 88. 88.4 67. 69.6 88.1 45.3 72.3 20.2 32.8 78.4 54.1 72.8 83.3 74.4 46. 51.6 89.3 75.3 86. 91.3 34.2 45. 30.4 49.3 52.6 54. 70.3 80.7 67.8 71.6 92.7 24.3 14.7 43.7 58.2 95.7 80.6 83. 45.8 46.2 68.9 51.9 70.6 54.3 76.6 42.4 48.9 33.8 62.8 64.9 92.1 53.2 80.3 71.1 90.7 52.8 82.6 38.9 87.5 76.2 75.5 50.3 64.8 77.4 93. 47.5 25.2 58.8 48.6 52.7 39.8 44.7 33.6 68.3 57.9 66.5 76.5 90. 69. 94.9 53.8 58.3 87.8 34.6 55.9 65.5 35.7 41. 58.7 42.8 30.1 50.7 25. 79.8 89.1 61. 58.9 53.7 78.1 82.1 72.7 82.4 66.6 56.2 93.7 74.3 37.6 69.9 86.6 43.6 74.9 65.3 58.5 92.3 69.2 84.6 79. 85.3 95.1 35. 73.4 54.9 50. 99.2 68.1 63.9 47.3 56.1 65.2 71. 73.9 42.5 90.9 80.1 43. 29.8 45.9 28. 48.5 73.7 50.9 31.7 89.4 45.1 31.3 26.5 67.3 97.3 42.6 42. 37.5 73.2 19.6 57.5 44.1 40. 59.5 72.6 83.2 50.6 79.7 54.4 90.4 94.8 65. 87.6 26.7 96.7 53.4 81.7 52.2 45.4 73.1 38.2 82.7 39.3 47.6 77.8 72.4 43.1 56.5 68.5 76.9 74.7 68.8 75.8 89.9 44.6 65.9 26.1 58.1 95.5 55.4 61.4 70. 87.9 54.5 68.2 50.2 68.4 45.7 56.3 97.6 48.7 60.8 59. 79.6 59.8 91.2 90.5 46.6 91.8 85.2 74.1 73.5 18.9 53. 77.5 41.3 35.5 65.8 82.2 86.2 27.8 27.2 63.5 83.9 54.2 34.8 56.9 31.6 78.3 62.2 65.1 43.3 26.8 30.2 55.5 10.6 19. 55.7 50.5 26.6 14. 41.4 56.6 75. 96.2 11.7 21.6 52.4 62.5 19.8 57.8 78.7 80.9 77.1 29.5 79.3 34.4 74.2 81.1 56.4 23.1 20.9 79.1 93.3 70.9 86.5 77.3 15.7 94.1 88.5 25.3 76.7 22.1 75.1 36.4 81. 47.9 77. 94.6 37.8 27.5 80. 72.9 61.3 20.4 88.6 27.6 32.5 82.8 41.7 59.1 59.3 59.6 69.4 31. 84.3 41.1 61.2 16.1 28.5 31.8 33.9 68.6 62.4 45.5 84.9 31.2 92.6 33. 36.5 91. 34.5 60.6 41.9 87.3 96.1 95.9 79.4 25.1 94. 46.7 35.9 98.4 88.9 83.7 87. 26.9 76.4 77.9 60.5 67.6 71.7 18. 66.2 70.1 90.1 70.2 60.3 39.7 63.6 66.9 75.7 24.7 43.5 52.3 73.3 46.4 12. 36.6 87.2 17.6 65.7 41.2 32.9 44.8 24.6 51.4 62.7 35.4 34.9 93.1 96.9 43.4 20.5 39. 19.7 6.8 37.2 67.2 75.2 18.1 89.6 99.5 50.8 49.5 69.5 72.1 91.1 92.9 86.7 30.8 69.7 86.1 62.9 91.6 21.8 28.2 58. 60.1 29.1 62.3 79.9 90.3 39.4 48. 89.7 22.7 97.4 19.4 40.8 40.3 41.6 9.5 32. 57. 98.9 51. 40.2 71.5 42.3 15.4 35.3 78.2 80.2 21.4 49.6 58.4 12.9 30.3 21.1 41.8 60.7 15. 51.1 95.8 19.1 64.6 78.6 42.7 32.3 34.1 66.3 49.8 22.3 70.5 17.4 32.7 36.8 33.7 48.8 31.5 84.4 98.5 88.7 24.9 37.1 53.6 32.6 93.2 23.5 54.8 95.3 23.8 48.3 63.7 93.9 78.8 34.7 87.4 40.1 74. 87.7 57.6 67.4 60.9 91.7 43.8 98.2 23.7 28.4 23.3 85.4 98.8 81.2 60.4 63.8 42.9 51.7 37.4 31.9 89. 36.7 97. 16.8 65.4 63. 18.5 21. 42.2 26.3 24.8 79.5 76.1 55.1 22.8 34. 45.6 49.2 47.1 23.9 96. 34.3 38.8 36.3 33.4 95. 16.2 8.3 62.6 7.3 91.5 69.1 99. 12.3 39.9 93.5 96.6 35.6 93.8 12.2 64.1 30. 37.7 82. 40.5 20.8 50.1 44.3 12.1 84.5 92.2 45.2 74.8 31.4 35.8 20.6 67.5 25.4 46.3 8.5 16.4 20. 91.9 12.4 98. 61.8 30.6 42.1 97.2 17.3 97.1 17.7 22. 35.2 22.5 29.7 41.5 81.9 11.4 32.1 38.4 20.3 14.2 19.2 36.9 84. 7.8 24. 40.4 14.1 53.1 14.3 8.4 90.6 16.5 28.1 18.2 21.2 39.5 28.3 29.4 11.1 48.2 28.8 20.7 59.7 26.2 17.5 99.7 31.1 29. 7.6 27.9 98.3 5. 17.8 95.6 26. 36.1 99.6 26.4 17.1 25.8 92.5 40.9 25.9 12.7 94.3 18.4 94.7 90.8 27. 27.3 17. 17.2 8.6 22.9 93.4 15.2 98.6 13.5 15.8 55.3 17.9 96.8 83.6 13.8 30.9 21.3 97.9 18.3 27.4 5.2 47.7 14.4 15.1 23. 16.6 28.7 25.5 16.3 29.2 21.9 30.7 13.1 21.5 96.3 9.1 29.6 9.3 10.9 6.3 7.4 13.3 11.8 14.9 18.6 10.7 23.4 20.1 13.2 99.1 10.8 98.7 15.6 22.4 10.4 18.8 13.6 10. 14.5 8.8 18.7 6.6 25.7 19.9 21.7 7. 11.9 99.9 99.3 19.5 13. 24.1 15.5 16.7 14.8 14.6 15.9 6.9 97.5 19.3 13.4 8.7 12.5 11.5 13.7 11.6 7.7 9. 12.6 9.9 5.4 9.7 16.9 10.3 6.4 11.3 10.1 97.8 7.5 11.2 8.1 8.9 9.8 10.2 5.9 99.8 10.5 6.7 7.1 9.4 13.9 5.5 99.4 5.8 9.2 9.6 8. 8.2 6.5 7.9 5.1 6. 5.3 6.2 11. 5.7 6.1 7.2] Number of unique values in Video_Completion_Rate are:950 ---------------------------------------------------------------------------------- Discussion_Participation [ 2. 3. 1. 5. 4. 0. 6. 8. nan 9. 12. 11.] Number of unique values in Discussion_Participation are:12 ---------------------------------------------------------------------------------- Time_Spent_Hours [ 0.5 0.9 7.4 11.8 3.3 4.2 2.9 13.4 9.2 10.5 3.5 9.8 7.2 6.3 4.3 1.1 3.7 2.3 6.5 12.6 8. 1. 1.8 18.8 2.4 2.8 10.9 4.4 9.5 9.6 2.2 7.5 7.6 1.3 7. 3.2 0.6 9.3 3.6 7.7 12.3 4.1 13.6 1.9 0.8 8.8 8.2 3.9 8.5 2.1 1.4 11.3 10.6 3.4 4.9 6.8 5.4 8.3 8.4 4.8 2.7 6.6 8.7 10.3 5.7 5.9 6.7 4. 15.6 1.2 4.6 1.6 9.7 11.7 2.5 2. 6. 5.8 3.8 1.5 6.1 5. 12. 13.5 6.4 12.4 3.1 16.5 5.6 6.9 10.8 5.3 5.5 4.7 10.1 13.9 15.9 1.7 3. 4.5 11.6 10.2 5.2 5.1 8.6 7.8 11.5 13.2 14.4 12.2 8.9 8.1 12.9 6.2 7.1 11. 10. 9.4 16.8 13.7 13. 15.4 9.9 2.6 11.1 11.9 0.7 7.9 13.8 13.1 10.7 17.5 9. 15. 15.3 7.3 10.4 9.1 15.7 16.4 12.1 12.5 11.4 18.3 16.6 14.5 14.8 14.2 14.1 14.9 12.7 13.3 15.5 21.8 12.8 15.2 14.7 14.6 19.5 17.6 17.1 21.2 14.3 11.2 17.4 16. 17.7 16.3 14. 17.8 19.3 15.8 16.1 20.1 17.3 17. 18.1 15.1 20.3 22.7 16.2 18.4 16.9 23.4 16.7 18.6 19.7 17.2 18.5 18.2 20.9 23.8 21.5 20.5 17.9 19.2 24.7 19.1 25.6 18. 23.9 24.8 19. 21. 20. 18.9 20.2 19.8 19.4 20.6 22.9 22.2 18.7 22.5 22.1 21.6 19.6 21.3 20.4 19.9 22.4 21.1 20.8 22. 24. 20.7 23.5] Number of unique values in Time_Spent_Hours are:227 ---------------------------------------------------------------------------------- Days_Since_Last_Login [ 1 3 19 4 0 2 5 11 18 9 8 7 10 12 13 31 6 20 15 22 17 25 37 16 36 34 35 43 14 21 49 63 28 42 32 33 27 23 26 39 29 30 24 41 44 61 70 56 45 46 48 47 38 64 54 60 52 53 40 55 58 50 99 51 97 65 90 66 77 87 57 59 83 69 62 67] Number of unique values in Days_Since_Last_Login are:76 ---------------------------------------------------------------------------------- Notifications_Checked [ 6 5 9 7 2 1 3 4 8 10 11 0 12 13 14 16 15 18 17] Number of unique values in Notifications_Checked are:19 ---------------------------------------------------------------------------------- Peer_Interaction_Score [ 4.3 7.8 6.7 6.4 7.5 6.2 7.2 5.4 6.8 4.9 3.2 7. 10. 4.7 7.9 2.7 7.7 5.6 3.9 6.1 4.6 8.7 4.8 4.2 5. 8.1 3.8 7.3 8.3 8.8 9.4 6.9 3.6 2.8 8. 5.1 3. 5.2 5.3 4.1 7.4 3.4 7.1 9. 6. 9.5 1.7 5.7 4. 8.6 9.2 3.3 8.2 4.4 8.9 7.6 1.2 2.9 6.5 8.4 6.3 5.9 5.5 5.8 1. 3.7 2.2 2.4 9.8 9.3 6.6 9.1 2.6 2.5 3.5 3.1 2.1 8.5 9.9 9.6 4.5 1.9 0.7 9.7 0.9 1.6 2.3 1.3 0.5 2. 0.8 0. 1.5 0.6 1.8 1.1 0.3 1.4 0.4 0.1 0.2] Number of unique values in Peer_Interaction_Score are:101 ---------------------------------------------------------------------------------- Assignments_Submitted [ 8. 4. 0. 5. 7. 3. 10. 2. nan 6. 1.] Number of unique values in Assignments_Submitted are:11 ---------------------------------------------------------------------------------- Assignments_Missed [ 1 6 2 10 5 3 7 0 8 4 9] Number of unique values in Assignments_Missed are:11 ---------------------------------------------------------------------------------- Quiz_Attempts [ 5 3 4 8 6 2 7 1 9 0 12 10 11 13 15 16 14] Number of unique values in Quiz_Attempts are:17 ---------------------------------------------------------------------------------- Quiz_Score_Avg [ 80.9 78.4 100. 59.1 84.8 99.6 74.4 76.3 55.8 58.6 82.1 81.5 73.3 58.3 68.2 74.1 60.7 59.9 71. 63.6 77.5 91.2 73.1 64.3 67.4 55.1 66.7 86. 80.2 63.4 64.5 87.3 82.9 81.6 77.7 73.7 76.4 87.6 65.3 51.6 66.5 72.6 83.9 71.7 77.4 75.5 62.6 66.4 89.7 69.6 55.7 92.9 63.9 67.2 94.4 87.4 67.3 61.6 80. 62.7 70.2 45.8 73.9 67.8 63. 58.2 70.7 89.3 80.6 74.7 68.5 73.6 87.8 56.4 73.5 84.7 58.5 78.5 67.6 68. 74.9 77.2 97.6 74. 86.7 72.5 71.5 88.4 70.6 73.4 62.5 85.6 55.9 82.2 59.7 48.2 52.5 61.3 79.2 84.6 63.7 57.6 72.3 93.5 80.7 79.9 67.9 71.1 72.1 55.6 90.1 71.4 69.7 72.8 62.4 53.6 75.9 42.5 56.8 66. 59. 84.4 91.3 59.8 64.7 73.2 94.7 84.9 85.2 69. 90.9 78.1 59.4 84. 50.1 99.9 80.1 83.8 67.7 64.9 69.8 74.8 82.7 54.7 75. 77.1 81.3 83.2 77.3 70.9 89.4 91.4 53.2 61.9 76. 74.3 87.7 94.2 90.3 87. 85.9 65.8 59.3 76.9 80.4 57.5 51.8 71.9 64.1 56.1 49.1 56.5 56.9 82.8 90.8 82.6 83.1 79.4 82.4 63.8 72.9 86.4 69.3 52.9 73. 84.1 64.8 82.5 88.8 65.1 58.4 75.6 93.6 71.2 60. 63.3 59.5 61.5 94.3 81.8 75.3 52.8 67.1 69.2 90.5 61.8 57.4 65.4 51.1 60.4 65.6 71.3 49.5 85.8 70. 60.2 70.4 74.6 69.9 89.6 91.5 63.1 79.6 31.2 90.2 96.2 68.3 50.9 92.2 65.9 72.7 62.9 86.6 82. 95.2 65.5 81.7 75.2 60.5 81.1 81.4 72.2 73.8 85.4 78.9 84.5 66.9 90.6 89.9 47.2 79.5 89. 40.4 68.9 79.1 61.1 88.1 71.8 57.9 77.8 88.6 76.2 94.1 80.8 68.4 55.4 51.9 81.9 86.8 61.4 79.7 38.7 93.9 78.6 86.5 62.1 75.8 75.7 74.2 86.1 91.1 49.9 89.8 93. 54. 49.4 68.8 77.6 62.2 78.8 66.8 64.2 47. 66.6 91.8 79. 93.3 47.1 93.1 93.4 91.6 53.8 64.4 50.3 76.7 54.8 81.2 78.3 77. 83.5 82.3 57.7 54.9 69.1 58.7 75.4 65.2 53.5 76.6 48.9 95.7 92.8 83.3 97.7 58.1 65.7 78. 97.4 80.5 72. 63.2 70.5 49.6 88.3 89.2 87.9 78.2 79.8 60.8 56.3 83. 81. 70.3 88.7 57.1 94. 84.3 94.9 59.6 80.3 67. 60.9 45.3 90. 49. 83.4 53. 91.7 66.3 84.2 54.3 53.4 55.2 66.2 86.9 59.2 58. 92.4 56.6 55.5 56.7 70.8 69.4 61.7 57. 95.8 53.9 57.8 46.3 68.1 60.6 43.7 60.3 87.1 51.2 76.8 72.4 61.2 88.2 85.5 62.8 64. 83.6 54.4 77.9 83.7 64.6 94.6 51.3 97.3 44.9 48.4 54.2 71.6 93.7 31. 86.3 57.2 53.3 58.9 51.4 99.8 43.3 79.3 76.1 96.3 46.9 98.6 66.1 94.5 61. 99. 91.9 92. 40.2 48.3 65. 96. 47.5 97.9 44.7 89.5 49.7 37.7 97.1 67.5 69.5 68.6 53.1 92.6 94.8 86.2 93.8 78.7 49.8 52.7 55.3 51.5 38.3 95.9 76.5 46.8 87.2 95.4 56.2 52.2 47.7 46.2 37.8 62. 85.1 90.4 56. 39.5 45. 42.4 37.5 63.5 85.3 90.7 36.4 40.5 48.1 53.7 50. 96.6 52. 55. 48.8 44.2 89.1 98.8 88.5 52.1 70.1 92.7 62.3 74.5 96.1 58.8 85.7 48.7 92.5 75.1 43.8 52.3 40.9 85. 45.4 99.3 42.3 57.3 60.1 46.1 96.5 50.4 68.7 47.4 51.7 39. 95. 99.2 41.1 34. 45.7 88. 48.5 44.8 96.4 49.2 50.5 92.1 43.5 99.5 99.1 95.6 96.9 98.4 98.9 45.5 95.1 36. 54.6 92.3 43.4 51. 97.8 44.3 97. 45.6 88.9 50.7 52.4 98.3 46. 41. 95.5 99.4 91. 50.2 98.5 98. 48.6 44.4 45.9 41.9 99.7 39.7 46.6 96.7 87.5 36.7 54.1 39.8 47.3 32.7 43.1 40.8 46.7 40.6 52.6 43.9 54.5 50.6 93.2 96.8 50.8 47.9 34.9 45.2 41.7 42.2 43. 31.3 41.2 42.7 44.5 46.5 48. 33.7 47.8 39.6 32.6 97.2 39.1 95.3 42.8 37.3 42. 98.2 43.6 49.3 47.6 98.1 37.1 40.7 38.4 38.9 35.6 40. 38.8 98.7 42.6 44.6 37.2 29.4 39.3 41.3 42.1 36.9 46.4 41.8 44.1 43.2 38.5 97.5 37. 31.1 40.1 42.9 37.9 34.3 28.9 36.8 41.5 32.8 38.2 31.5 33.1 34.7 33.4 41.6 44. 25.7 38.6 35.8 40.3 23.5 36.3 35.4 33.8 35.3 32.2 33.3 29.5 32.4 35.1 30.5 45.1 19.8 39.9 36.1 38. 30.8 39.4 28.1 24.6 19.6 25.2 29.1 39.2 36.5 35.2 35.9 41.4 29.6 34.1 32.9 21.2 30.9 35.7 34.8 26.8 34.4 23. 38.1 30.1 31.6 33.6 33. 25.8 34.5 28.5 32. 32.1 31.4 27.7 37.6 30.7 30.2 36.2 37.4 25.6 26.3 20.4 26.9 35.5 27.1 33.5 27.3 24. 34.6 27.6 29.8] Number of unique values in Quiz_Score_Avg are:714 ---------------------------------------------------------------------------------- Project_Grade [ 71.2 42.5 87.9 51.4 93. 65.6 80.4 72.9 69.4 54.1 100. 54.6 76.7 66.4 61.4 68.9 69. 53.4 76.4 57.3 51.8 74.2 43.5 66.2 51.2 62.8 72.5 57.1 67.2 72.4 61.8 48.8 54.4 76.2 94.4 46.7 67.8 89.2 51.9 65.5 76.6 66.7 57. 75. 73.4 48.6 48. 70. 83.2 71. 38.3 69.5 72.2 93.6 50.3 81.5 68.4 69.9 54.7 82.9 50.5 78.5 47.2 84.2 86. 52.9 54.5 72.3 59.6 66.8 70.4 59. 70.7 76.8 84.5 64.3 50.8 93.5 62.9 69.8 78.3 78.2 74.1 73.3 59.3 74.9 70.1 46.1 75.4 61. 92.6 42.1 52.8 51.5 79.5 77.1 99.2 62.3 25.8 44.6 61.2 86.8 65. 65.2 55.5 82.2 39.9 32.7 85.9 97.7 37.2 47.6 56. 92.7 66.1 77.9 44.1 61.7 74.7 31.8 16.1 36. 59.8 52.3 53.7 79.2 50.2 47.9 44.3 72.7 76.1 57.4 64.5 74.3 99.6 73.6 71.6 55.2 63.4 81.4 45.5 64.1 72.1 58.6 71.1 61.9 58.7 76.5 79.4 47.5 50.1 52.6 73.7 63.2 55.6 64.2 37.9 71.9 65.9 71.8 87.8 52.1 97.5 49.1 85.6 65.4 41.1 58.9 52.4 59.5 80.3 36.7 68.3 46.8 78.8 73.2 88.9 72.8 60.3 85.5 46.5 79.8 97.1 62.5 64.9 58.5 63.5 89.9 65.8 75.5 72. 62.1 21.3 83.8 53.8 58.2 70.9 60.1 74. 83.3 60.8 66. 87.4 68.5 63.7 43.2 64.8 83.5 51. 79. 63.6 74.8 40.3 90.8 68. 89.4 92.2 50.9 86.1 49.9 38.1 73.1 63.8 80.7 82.7 63.1 48.4 51.7 83.6 81.8 69.1 70.5 54.9 53.5 70.8 78. 56.1 58. 48.2 77.6 78.4 53.1 70.2 96.5 83.1 49.4 82. 60.6 59.4 80.5 93.9 67.9 60.2 66.5 80.1 40.9 54.8 85.8 72.6 77.8 55.3 65.1 68.8 73.9 48.1 33.9 84.6 71.5 52.2 67.1 95.6 83.9 50.7 88.5 63.3 80.8 59.9 66.9 56.4 61.5 93.2 67.6 44.8 55.4 89.3 59.1 75.8 44.2 84.7 86.6 35.9 87.3 55.9 77.7 35.1 62.7 75.1 45.9 53.2 92.8 64.6 57.7 83.7 79.3 67.4 67.5 56.7 84.4 51.3 71.4 59.2 93.4 77.4 75.2 87.5 56.6 69.2 41.6 93.3 58.8 82.6 98.6 84. 68.6 49.3 62. 67.7 74.5 65.7 77.2 89.8 47.4 71.7 48.7 76.9 67.3 96.3 41. 60.7 61.3 58.4 64.7 53.3 35.3 46.3 36.6 53. 85. 48.3 86.5 50. 63. 84.1 70.3 88.7 92.5 43.7 68.7 47.8 59.7 90.2 99. 49.6 60.4 49.7 82.4 64. 65.3 44.9 62.4 54. 69.6 30.7 92.4 96.1 95.7 94.8 40.2 60.5 90.3 53.6 69.3 78.7 67. 84.3 58.1 47. 51.1 70.6 96.8 78.9 91.4 37.1 56.8 38.9 80.6 80.2 87.1 57.8 89.5 94.2 93.1 75.3 51.6 83. 38.8 45. 27. 41.8 68.1 74.4 48.9 79.9 89.7 55.7 61.6 42.3 55.1 57.2 91.3 43. 53.9 73. 60. 86.2 77.5 52.7 55.8 78.1 57.5 97. 82.5 89. 87.6 66.3 46.4 50.4 94.5 94.6 91.9 29.4 73.8 56.9 86.3 35.2 97.8 85.2 87.7 89.1 86.4 61.1 97.3 82.8 99.5 54.2 58.3 96.9 33.4 66.6 79.1 54.3 34.4 91. 91.7 52.5 95.2 78.6 57.9 45.8 34.2 32.3 81.9 24.2 35.6 81.3 55. 86.9 26.2 44.4 71.3 19.8 84.9 49. 60.9 91.5 75.6 33.2 96. 69.7 43.4 41.4 56.5 49.2 90. 81.7 68.2 86.7 90.5 41.7 74.6 91.6 42.7 81.6 87.2 63.9 95.3 34.6 40.7 47.3 85.3 29.6 92.1 45.2 76.3 29.7 82.3 88.1 18.5 39.5 62.2 56.3 75.9 56.2 40.4 64.4 40.5 46.6 80.9 88.4 44.7 43.8 57.6 37. 46. 85.1 90.9 90.7 77.3 98.8 39.4 42. 98.1 95. 95.5 76. 99.8 44.5 75.7 81.1 37.8 88.3 92.3 91.8 85.4 45.7 43.9 36.1 48.5 38.6 30.8 99.4 81.2 37.3 39.3 92.9 98.5 49.5 39.6 79.6 34.8 36.3 89.6 81. 94.1 45.4 36.2 90.1 91.2 91.1 50.6 36.8 31. 88. 88.2 82.1 99.7 73.5 98. 79.7 19.6 88.6 20.3 28.9 26.4 39.2 26.8 83.4 62.6 97.4 37.4 88.8 40.1 46.9 92. 95.9 14.1 45.3 52. 80. 94. 98.4 38. 43.6 26. 32.9 38.5 99.3 94.3 98.7 85.7 35.4 43.3 40. 95.1 22.6 77. 43.1 90.6 33.6 45.1 19.9 32.2 38.4 44. 97.2 96.6 24.8 49.8 98.9 40.8 42.9 84.8 99.9 29.8 47.7 90.4 39. 47.1 87. 10.7 40.6 97.9 37.7 96.7 45.6 99.1 36.9 41.5 42.8 29. 36.4 30.6 41.3 93.7 96.2 18.2 42.2 42.4 35. 46.2 30.2 41.2 25.2 25. 94.7 95.8 30. 34. 32.6 93.8 30.4 34.5 98.3 20.6 37.6 39.7 22.9 37.5 29.3 24. 34.3 23.4 28.2 19.2 31.7 41.9 26.6 22. 35.5 25.9 94.9 38.2 31.6 21.4 27.7 16.7 98.2 31.3 26.3 28.1 23.3 31.9 25.7 42.6 31.1 28.6 32. 97.6 28. 28.8 31.4 33.8 96.4 95.4 33. 33.3 31.2 22.3 27.9 29.9 15.5 35.8 25.4 32.4 34.1 38.7 35.7 19.5 39.8 30.1 39.1 32.8 36.5 21.8 28.7 20.1 29.1 29.5 23.2 24.6 30.5 28.3 29.2 30.9 34.7 21.9 33.1 21.7 25.6 30.3 15.6 23.6 20.8 34.9 27.5 23.9 33.7 27.6 25.3 20.4 24.4 33.5 23.7 24.7 24.3 31.5 18.7 15.7 18.4 26.5 11.5 22.8 19.7 23. 22.5 18. 24.5 6.8 26.7 27.8 26.1 8.7 13.4 32.5 23.1 16.6 26.9 27.4 22.4 28.4 22.7 28.5 18.9 19.4 23.8 32.1 10.1 27.2 11.3 20.5 13.9 16.9 17.3 20.7 14.4 20.9 15.9 12.9 21.1 20.2 18.8 5.3 16. 22.2 18.3 23.5 27.1 27.3 25.1 20. 19. 22.1 13.7 16.4 21. 9.5 21.5 5.5 17.4 21.2 14.7 24.9 17.1 11.9 14.9 8.5 14.8 25.5 14.2 19.3 10.5 11.6 7.4 17.6 11.4 15.4 13.6 16.8 14.5 18.6 17.8 17.9 0. 15.2 11.2] Number of unique values in Project_Grade are:865 ---------------------------------------------------------------------------------- Progress_Percentage [70.8 55.6 78.8 24.7 64.9 75.3 55.1 73.4 92.3 42.9 52. 55.2 40.6 64.2 40.9 65.8 63.3 34.9 58. 32.8 80. 44.8 56.2 37.4 56.7 63.1 53.7 45.5 29.7 72.7 66.7 76.4 33.1 62.9 44.4 46.7 59.2 66.4 61.8 30.9 67.6 64.6 52.8 36.9 49.7 50. 64.8 40.7 60.5 69.4 44.9 56.3 63.5 45.9 58.6 70.4 72.8 50.4 53.1 59.7 40.8 53.5 51.6 54. 65.5 75.6 45.6 60.6 64. 53. 22.6 50.6 45. 39.2 50.1 67.8 81.2 69. 66.5 67.3 48.4 52.2 45.8 85.4 45.7 62.4 68.4 61.2 39.1 28.5 39.3 37.9 44.1 38.4 79.6 68.2 43.4 70.6 49.1 60.1 38.3 43.7 25.5 50.3 80.8 34.5 52.6 73.9 51.3 42.5 68.8 55.4 47.3 31.3 66.8 58.9 51.2 66.1 40.4 34.1 59.5 16.4 46.9 41.3 54.8 37.3 53.8 51.7 78.7 67.4 70.5 49.9 24.4 61.9 61.6 63.9 44. 48.9 26.5 56.8 47.5 36.4 37.2 46.4 45.2 58.1 57.1 34.2 64.3 57.2 63.2 72.2 39. 30.8 41. 31.7 70.1 32. 49.2 53.3 36.5 46.3 58.7 51.4 68.6 49.4 47.1 59.3 69.3 56.1 46.1 69.7 48.5 76.6 43.2 61. 51.8 69.2 52.4 43.6 39.9 62.3 75.4 49.6 52.7 71.8 46.2 51.5 54.9 63. 54.5 62.1 62. 57.3 41.4 43.9 43.3 35.2 49. 47.7 28. 48.3 82.5 56.5 70.3 58.5 66.9 38.9 35.6 40.2 77.6 61.7 41.2 38. 50.2 42.4 71.1 71.4 32.2 38.7 37.7 68.1 69.1 43.5 73.6 56.9 59.1 57.6 59.9 42.7 36.1 53.4 36.8 60.2 58.3 36.6 57.4 63.6 72.3 62.5 44.3 33.7 61.1 63.8 58.4 53.2 68. 57.5 76.8 47. 54.7 76.2 56.6 54.3 62.7 75.2 62.8 44.6 61.3 82.9 44.7 64.4 45.4 47.6 46.8 57. 69.9 31. 65.9 47.2 42. 52.1 48.2 33. 59.8 93. 59. 48.7 59.6 55.5 78.9 73. 60.8 73.5 53.6 65.7 68.7 48. 51.1 76.3 25.1 63.7 54.2 76.7 74.3 55.3 65.1 66.2 72.9 27.1 60.9 58.2 66.6 55.9 68.9 60. 48.1 70.7 40.5 77.9 54.4 39.7 28.4 62.2 62.6 30.1 50.8 50.9 30.7 66.3 46.5 45.1 49.5 67.7 79.9 73.1 46.6 47.9 74.4 65.3 35.4 50.7 51. 70. 56.4 51.9 65.6 32.1 55.8 67.2 55. 89. 41.8 68.5 39.5 53.9 31.1 49.3 72.1 65.2 69.8 38.1 75.7 60.3 70.9 41.1 66. 75. 54.1 77. 83.6 65.4 40.3 33.4 52.3 39.8 44.2 81.1 24.5 44.5 72.4 73.8 42.3 35.1 50.5 37.8 34. 55.7 20.2 64.5 27.3 39.6 26.7 31.8 47.4 41.7 60.4 64.1 26.6 78.4 67. 79.4 58.8 71.3 35. 47.8 75.5 77.8 78.3 64.7 78.1 49.8 71. 38.5 43.8 81.6 29.2 32.3 32.4 74.1 60.7 71.2 30.5 43.1 46. 34.6 48.8 33.9 36.2 71.9 69.6 73.3 57.7 45.3 63.4 70.2 30.2 57.9 31.6 34.8 42.1 93.4 36.7 38.2 83.5 42.2 41.6 67.9 77.2 nan 52.9 33.6 48.6 33.5 30. 57.8 10.2 40. 39.4 33.3 74.6 34.4 72.6 27.9 24.9 65. 81.5 41.5 41.9 29.9 43. 54.6 67.1 77.1 25.8 78.2 71.5 74.5 36. 74.9 79. 27.2 16. 73.7 29.8 80.9 29. 22.7 84.9 61.4 80.4 88.4 85.5 28.8 92.9 77.7 35.3 80.5 42.6 69.5 35.8 82.3 32.9 40.1 21.4 85.2 28.2 71.6 37.1 74.7 72.5 68.3 37.5 34.3 37. 61.5 38.6 26.9 17.5 38.8 74.2 32.7 30.6 67.5 80.1 26.4 76. 17.9 56. 17.1 35.7 86.6 79.3 59.4 72. 21.5 73.2 78.5 42.8 37.6 88.5 16.6 32.6 20.8 77.4 79.2 32.5 25.2 36.3 76.1 78.6 78. 31.5 75.9 23.2 90.8 27.6 22.5 29.5 21. 14.4 82.2 34.7 75.8 25. 31.4 33.2 29.1 35.5 31.9 74.8 24.2 87.1 30.4 19.9 74. 23.7 76.9 81.3 82.6 26. 29.4 84.4 27.5 22.4 81.9 23.1 79.8 71.7 28.3 88.8 84.8 25.7 89.1 77.3 21.2 27.8 87. 86.8 26.8 25.6 27.7 29.3 26.1 20.3 79.1 33.8 24.8 81.8 20.4 21.9 83.7 83.2 80.3 82.7 79.5 19.3 25.4 19.5 82.1 89.7 81.4 24.6 23.8 12. 35.9 89.5 77.5 9.7 11.2 85.3 27. 76.5 86.7 22. 28.1 82. 87.8 18.6 90. 90.2 19.7 25.9 21.1 92.8 86.4 75.1 23.4 19.4 24.1 28.7 84.1 83. 20.9 84.2 84. 88.6 21.8 83.3 31.2 22.9 85. 81. 24. 25.3 18.2 80.6 89.4 83.1 16.1 30.3 80.7 83.4 17.3 90.1 16.8 17.7 18.5 94.4 28.6 21.7 87.9 19.6 82.4 15.8 26.2 82.8 29.6 81.7 14.7 80.2 84.5 18.3 84.3 23.3 22.3 10.8 24.3 20.5 79.7 21.6 20.7 91.3 23.5 83.8 23.9 19. 13.9 23.6 88.3 28.9 87.3 18. 95.4 14.3 83.9 88. 86.9 19.1 15.4 88.7 85.1 92.2 85.7 86.1 20.1 16.7 22.1 91.5 84.6 86. 84.7 86.5 87.6 10.4 26.3 87.2 91.4 97.4 86.2 20.6 23. 18.7 16.9 19.2 15.9 85.8 90.9 86.3 13.7 18.8 13.6 27.4 18.9 89.6 21.3 85.9 87.4 17.6 93.3 17.4 88.9 19.8 11. 20. 90.5 12.9 91.7 89.8 18.4 94.6 22.8 22.2 17.2 88.1 11.3 91.1 14.2 15.2 16.3 96.6 15.3 10. 8.4 87.5 14.5 18.1 91.2 89.9 90.4 14.6 9.4 90.3 12.3 85.6 8.1 96.1 7.6 90.6 14.8 87.7 94.9 15. 13.1 88.2 13.4 15.5 12.8 16.5 15.7 98.6 89.3 94. 94.8 15.1 11.7 96.5 17. 11.8 89.2 11.1 91.8 92. 91. 12.6 92.4 12.7 13.5 90.7 91.6 17.8 10.6 93.2] Number of unique values in Progress_Percentage are:823 ---------------------------------------------------------------------------------- Rewatch_Count [ 0. 2. 4. 5. 1. 3. 8. 7. nan 9. 12. 11. 13. 15.] Number of unique values in Rewatch_Count are:14 ---------------------------------------------------------------------------------- Enrollment_Date ['01-06-2024' '27-04-2025' '20-01-2024' '13-05-2025' '19-12-2024' '23-10-2023' '24-03-2024' '09-11-2024' '13-07-2024' '07-11-2024' '11-07-2025' '02-12-2023' '11-06-2024' '24-10-2024' '16-07-2024' '23-06-2024' '02-08-2025' '13-12-2023' '30-11-2024' '19-05-2025' '05-08-2025' '05-06-2025' '06-03-2024' '06-02-2024' '22-12-2024' '07-02-2025' '28-10-2023' '23-01-2025' '12-03-2025' '18-07-2025' '07-05-2024' '12-04-2025' '08-02-2024' '18-06-2024' '05-01-2024' '14-04-2025' '01-05-2025' '23-12-2024' '22-09-2025' '03-09-2025' '17-02-2024' '10-06-2025' '28-12-2023' '19-09-2024' '25-02-2025' '23-05-2025' '06-12-2024' '13-10-2024' '11-12-2023' '05-02-2024' '08-03-2024' '02-01-2024' '25-01-2024' '22-05-2024' '19-10-2023' '24-01-2025' '08-05-2025' '15-12-2023' '02-09-2024' '24-04-2024' '13-06-2024' '31-05-2025' '20-08-2025' '10-07-2024' '14-07-2024' '27-01-2025' '26-10-2023' '31-12-2023' '08-10-2024' '26-03-2024' '04-07-2025' '18-08-2024' '03-02-2024' '19-05-2024' '08-12-2023' '01-12-2023' '11-07-2024' '27-12-2024' '23-09-2025' '04-06-2025' '27-08-2024' '26-11-2023' '30-01-2024' '15-06-2024' '12-12-2024' '11-12-2024' '04-02-2024' '31-05-2024' '24-08-2024' '28-11-2024' '28-07-2024' '18-04-2025' '18-06-2025' '05-01-2025' '29-05-2025' '18-10-2023' '02-09-2025' '27-08-2025' '21-02-2024' '15-09-2024' '30-04-2024' '12-04-2024' '17-10-2023' '01-06-2025' '11-01-2025' '07-04-2024' '12-07-2025' '11-11-2023' '30-12-2023' '03-05-2024' '18-08-2025' '28-09-2025' '18-02-2024' '12-01-2025' '26-12-2024' '14-02-2025' '30-07-2025' '19-07-2025' '04-08-2024' '22-04-2025' '26-09-2025' '07-12-2024' '15-05-2024' '19-11-2024' '25-06-2025' '04-12-2024' '15-07-2025' '16-05-2025' '31-01-2025' '20-05-2024' '08-11-2024' '08-01-2024' '09-10-2024' '14-08-2025' '08-05-2024' '12-02-2025' '05-07-2024' '11-02-2024' '31-08-2024' '26-11-2024' '20-07-2025' '16-09-2025' '18-05-2025' '30-10-2024' '20-12-2024' '28-07-2025' '03-04-2025' '23-02-2025' '17-07-2024' '13-01-2024' '03-02-2025' '15-11-2023' '21-05-2024' '29-04-2024' '07-05-2025' '02-03-2025' '30-05-2024' '03-06-2025' '25-12-2024' '20-07-2024' '01-05-2024' '26-07-2024' '27-01-2024' '21-08-2024' '23-02-2024' '03-08-2024' '22-12-2023' '18-09-2025' '23-08-2024' '16-12-2024' '04-04-2025' '02-07-2025' '01-01-2024' '16-06-2025' '01-08-2024' '15-07-2024' '26-08-2024' '07-08-2024' '02-10-2025' '07-01-2025' '11-09-2024' '06-09-2025' '04-09-2025' '13-04-2024' '27-07-2024' '03-03-2025' '09-09-2024' '02-05-2025' '12-09-2024' '30-03-2025' '06-08-2024' '30-08-2024' '03-06-2024' '22-03-2024' '22-07-2024' '22-06-2024' '21-06-2025' '27-12-2023' '29-09-2024' '17-08-2024' '17-03-2024' '04-05-2025' '27-04-2024' '10-12-2024' '10-11-2024' '20-03-2024' '12-06-2025' '09-01-2025' '12-03-2024' '25-09-2025' '27-02-2025' '24-07-2025' '18-10-2024' '15-09-2025' '10-06-2024' '29-08-2025' '04-10-2024' '13-11-2024' '21-08-2025' '05-08-2024' '29-07-2025' '22-08-2025' '08-08-2025' '14-08-2024' '29-02-2024' '21-02-2025' '04-06-2024' '17-01-2024' '06-06-2025' '20-12-2023' '08-12-2024' '19-04-2025' '29-09-2025' '23-03-2024' '08-07-2025' '26-05-2024' '26-01-2024' '19-02-2024' '01-08-2025' '06-05-2025' '27-02-2024' '23-08-2025' '16-06-2024' '09-12-2023' '21-12-2023' '15-06-2025' '07-07-2025' '17-11-2023' '13-09-2024' '17-03-2025' '21-01-2025' '01-09-2024' '13-02-2025' '01-01-2025' '14-05-2025' '28-08-2025' '19-03-2025' '12-11-2023' '28-08-2024' '11-03-2024' '26-07-2025' '05-11-2024' '24-03-2025' '10-09-2025' '13-05-2024' '04-07-2024' '30-12-2024' '08-06-2024' '25-11-2024' '06-04-2024' '22-03-2025' '07-08-2025' '06-05-2024' '17-05-2025' '04-09-2024' '11-10-2024' '24-11-2023' '01-02-2025' '28-11-2023' '26-04-2024' '19-12-2023' '02-10-2024' '26-06-2024' '27-03-2024' '19-08-2025' '11-01-2024' '17-09-2024' '17-09-2025' '20-03-2025' '23-05-2024' '05-03-2024' '28-01-2024' '08-04-2025' '10-07-2025' '19-04-2024' '18-01-2025' '08-02-2025' '10-01-2025' '14-03-2025' '27-06-2025' '20-02-2025' '28-03-2025' '06-01-2025' '02-11-2024' '14-11-2023' '12-08-2025' '05-11-2023' '01-07-2025' '11-03-2025' '27-05-2025' '25-04-2024' '07-04-2025' '16-10-2024' '21-07-2025' '21-09-2024' '27-11-2024' '09-08-2025' '04-12-2023' '15-11-2024' '27-10-2024' '24-12-2024' '18-04-2024' '16-09-2024' '29-11-2023' '14-12-2023' '25-03-2025' '19-11-2023' '10-04-2024' '08-09-2024' '09-08-2024' '03-10-2025' '05-03-2025' '06-02-2025' '23-12-2023' '05-02-2025' '03-05-2025' '02-01-2025' '24-07-2024' '02-06-2025' '08-08-2024' '17-12-2023' '12-05-2024' '07-09-2024' '27-03-2025' '07-12-2023' '27-06-2024' '09-11-2023' '19-06-2024' '31-07-2024' '25-10-2024' '22-05-2025' '29-07-2024' '24-01-2024' '10-05-2025' '21-03-2025' '05-07-2025' '03-11-2023' '20-09-2025' '18-11-2023' '10-02-2025' '28-04-2025' '13-09-2025' '25-06-2024' '16-08-2024' '28-02-2025' '23-07-2024' '14-03-2024' '18-02-2025' '08-06-2025' '13-12-2024' '15-08-2025' '16-01-2024' '23-11-2023' '25-04-2025' '16-03-2025' '20-05-2025' '05-12-2024' '01-09-2025' '17-02-2025' '21-10-2024' '29-04-2025' '02-04-2024' '12-09-2025' '20-10-2023' '03-10-2024' '26-12-2023' '03-03-2024' '06-11-2024' '21-04-2025' '16-04-2024' '02-02-2025' '05-12-2023' '12-06-2024' '28-05-2025' '18-12-2023' '19-01-2025' '09-06-2025' '28-04-2024' '21-06-2024' '19-10-2024' '04-08-2025' '22-04-2024' '23-07-2025' '27-05-2024' '31-07-2025' '02-11-2023' '16-08-2025' '09-04-2024' '20-09-2024' '16-05-2024' '17-06-2025' '04-11-2024' '27-07-2025' '19-07-2024' '20-06-2025' '11-05-2025' '04-03-2025' '21-12-2024' '30-06-2024' '09-07-2025' '18-09-2024' '13-08-2025' '27-09-2025' '30-07-2024' '18-05-2024' '22-08-2024' '24-09-2025' '09-09-2025' '11-09-2025' '12-01-2024' '22-02-2025' '17-11-2024' '11-04-2024' '29-05-2024' '07-03-2025' '29-01-2024' '08-01-2025' '28-12-2024' '19-06-2025' '11-08-2024' '26-03-2025' '16-12-2023' '18-07-2024' '28-10-2024' '26-09-2024' '24-02-2025' '03-12-2023' '26-01-2025' '31-10-2024' '22-01-2024' '25-10-2023' '01-03-2024' '20-02-2024' '29-12-2024' '30-06-2025' '29-10-2023' '21-05-2025' '15-12-2024' '29-08-2024' '17-06-2024' '12-02-2024' '04-10-2025' '04-04-2024' '12-12-2023' '12-11-2024' '22-11-2023' '16-02-2024' '04-02-2025' '10-08-2024' '04-05-2024' '10-03-2024' '01-04-2025' '23-11-2024' '25-08-2025' '03-09-2024' '06-09-2024' '11-05-2024' '15-10-2024' '09-07-2024' '13-01-2025' '26-05-2025' '16-11-2023' '28-06-2024' '30-09-2025' '06-11-2023' '06-07-2025' '15-04-2024' '11-08-2025' '11-06-2025' '13-08-2024' '08-03-2025' '06-10-2024' '15-01-2024' '30-05-2025' '20-11-2023' '01-02-2024' '29-03-2024' '27-09-2024' '03-01-2025' '23-01-2024' '14-01-2024' '19-02-2025' '12-07-2024' '30-08-2025' '23-04-2025' '17-04-2025' '21-11-2023' '24-05-2025' '04-01-2025' '24-12-2023' '14-09-2024' '09-02-2024' '25-11-2023' '01-11-2024' '18-03-2025' '09-05-2024' '20-10-2024' '01-04-2024' '16-07-2025' '25-09-2024' '22-06-2025' '25-02-2024' '21-04-2024' '14-02-2024' '07-01-2024' '03-12-2024' '15-04-2025' '10-09-2024' '25-08-2024' '01-10-2024' '28-02-2024' '16-04-2025' '24-08-2025' '09-03-2025' '08-04-2024' '14-07-2025' '28-03-2024' '31-03-2025' '03-08-2025' '16-01-2025' '04-11-2023' '17-08-2025' '22-01-2025' '09-12-2024' '21-10-2023' '31-01-2024' '24-06-2025' '17-07-2025' '30-10-2023' '10-12-2023' '10-08-2025' '25-05-2025' '19-01-2024' '10-02-2024' '05-10-2025' '07-07-2024' '07-06-2025' '29-06-2025' '25-03-2024' '09-03-2024' '01-07-2024' '02-05-2024' '26-04-2025' '29-03-2025' '08-11-2023' '02-08-2024' '23-06-2025' '20-04-2024' '13-04-2025' '30-11-2023' '16-02-2025' '24-10-2023' '02-07-2024' '17-04-2024' '18-01-2024' '06-03-2025' '01-11-2023' '28-09-2024' '05-09-2025' '15-02-2025' '20-06-2024' '13-07-2025' '23-03-2025' '30-01-2025' '10-03-2025' '18-03-2024' '22-07-2025' '06-07-2024' '13-03-2024' '29-12-2023' '05-09-2024' '24-11-2024' '22-10-2024' '10-01-2024' '21-07-2024' '05-06-2024' '04-03-2024' '30-04-2025' '06-01-2024' '09-02-2025' '14-11-2024' '24-04-2025' '07-10-2024' '23-04-2024' '20-04-2025' '01-03-2025' '06-12-2023' '02-02-2024' '25-07-2024' '04-01-2024' '23-09-2024' '22-11-2024' '16-03-2024' '13-03-2025' '05-04-2025' '31-10-2023' '17-12-2024' '09-05-2025' '15-01-2025' '22-10-2023' '08-07-2024' '26-02-2024' '25-07-2025' '03-11-2024' '01-10-2025' '06-08-2025' '10-10-2024' '26-02-2025' '07-03-2024' '26-08-2025' '13-06-2025' '20-08-2024' '11-04-2025' '31-12-2024' '20-11-2024' '21-09-2025' '03-07-2025' '14-01-2025' '06-06-2024' '25-05-2024' '05-05-2025' '06-10-2025' '10-11-2023' '27-11-2023' '03-01-2024' '31-03-2024' '06-04-2025' '21-01-2024' '19-09-2025' '08-09-2025' '29-11-2024' '09-06-2024' '25-12-2023' '11-11-2024' '18-11-2024' '30-09-2024' '24-06-2024' '07-11-2023' '14-06-2024' '20-01-2025' '28-01-2025' '15-03-2025' '24-02-2024' '19-03-2024' '02-12-2024' '19-08-2024' '26-06-2025' '07-09-2025' '21-11-2024' '30-03-2024' '27-10-2023' '01-12-2024' '09-04-2025' '28-06-2025' '22-02-2024' '02-06-2024' '14-06-2025' '22-09-2024' '17-10-2024' '24-09-2024' '14-04-2024' '07-02-2024' '09-01-2024' '10-05-2024' '03-04-2024' '03-07-2024' '07-06-2024' '12-05-2025' '28-05-2024' '12-10-2024' '23-10-2024' '14-09-2025' '14-12-2024' '05-10-2024' '17-05-2024' '15-03-2024' '14-10-2024' '29-06-2024' '31-08-2025' '26-10-2024' '24-05-2024' '29-10-2024' '15-02-2024' '18-12-2024' '11-02-2025' '13-11-2023' '16-11-2024' '15-08-2024' '13-02-2024' '02-03-2024' '12-08-2024' '10-04-2025' '14-05-2024' '05-04-2024' '05-05-2024' '15-05-2025' '25-01-2025' '21-03-2024' '17-01-2025' '29-01-2025' '02-04-2025'] Number of unique values in Enrollment_Date are:721 ---------------------------------------------------------------------------------- Payment_Mode ['Scholarship' 'Credit Card' 'NetBanking' 'UPI' 'Debit Card' 'Free'] Number of unique values in Payment_Mode are:6 ---------------------------------------------------------------------------------- Fee_Paid ['No' 'Yes'] Number of unique values in Fee_Paid are:2 ---------------------------------------------------------------------------------- Discount_Used ['No' 'Yes'] Number of unique values in Discount_Used are:2 ---------------------------------------------------------------------------------- Payment_Amount [1740 6147 4280 ... 1786 2336 2424] Number of unique values in Payment_Amount are:6884 ---------------------------------------------------------------------------------- App_Usage_Percentage [ 49. 86. 85. 42. 91. 74. 83. 68. 78. 48. 88. 60. 100. 32. 40. 59. 77. 61. 70. 66. 46. 38. 93. 89. 67. 87. 47. 25. 51. 29. 57. 62. 79. 76. 90. 65. 82. 54. 73. 50. 98. 80. 69. 35. 56. 63. 81. 75. 28. 64. 45. 39. 31. 44. 71. 97. 34. 41. 96. 12. 94. 24. 58. 92. 99. 84. 72. 30. 95. 55. 53. 43. 52. 36. 37. 15. 33. nan 7. 21. 22. 13. 26. 27. 14. 23. 16. 19. 20. 17. 11. 18. 5. 10. 2. 8. 4. 3. 0. 9.] Number of unique values in App_Usage_Percentage are:100 ---------------------------------------------------------------------------------- Reminder_Emails_Clicked [ 3 0 1 2 5 4 7 6 9 8 11 13 10 12] Number of unique values in Reminder_Emails_Clicked are:14 ---------------------------------------------------------------------------------- Support_Tickets_Raised [4 0 3 2 1 5 6 7 8] Number of unique values in Support_Tickets_Raised are:9 ---------------------------------------------------------------------------------- Satisfaction_Rating [3.5 4.5 5. 3.8 4. 4.8 4.7 4.4 4.6 3.6 4.2 3.4 nan 3.9 4.3 4.9 3.7 4.1 2.9 3.1 3.3 3. 2.4 2.1 3.2 1.4 2. 1.9 2.7 2.3 2.2 2.6 2.5 1.8 1.6 1.5 1.3 1. 1.1 1.2] Number of unique values in Satisfaction_Rating are:40 ---------------------------------------------------------------------------------- Completed ['Completed' 'Not Completed'] Number of unique values in Completed are:2 ----------------------------------------------------------------------------------
Discount_Used No 80058 Yes 19942 Name: count, dtype: int64
| Discount_Used | No | Yes |
|---|---|---|
| Fee_Paid | ||
| No | 23948 | 5902 |
| Yes | 56110 | 14040 |
23
17
NULL VALUE TREATMENT¶
np.float64(10.712000000000002)
Inference:
- We see total 10% of null values present in the data
Student_ID 0.000 Name 0.000 Gender 0.000 Age 4.774 Education_Level 0.000 Employment_Status 0.000 City 0.000 Device_Type 0.000 Internet_Connection_Quality 0.000 Course_ID 0.000 Course_Name 0.000 Category 0.000 Course_Level 0.000 Course_Duration_Days 0.000 Instructor_Rating 0.000 Login_Frequency 0.112 Average_Session_Duration_Min 0.000 Video_Completion_Rate 0.000 Discussion_Participation 0.955 Time_Spent_Hours 0.000 Days_Since_Last_Login 0.000 Notifications_Checked 0.000 Peer_Interaction_Score 0.000 Assignments_Submitted 0.952 Assignments_Missed 0.000 Quiz_Attempts 0.000 Quiz_Score_Avg 0.000 Project_Grade 0.000 Progress_Percentage 0.335 Rewatch_Count 2.476 Enrollment_Date 0.000 Payment_Mode 0.000 Fee_Paid 0.000 Discount_Used 0.000 Payment_Amount 0.000 App_Usage_Percentage 0.026 Reminder_Emails_Clicked 0.000 Support_Tickets_Raised 0.000 Satisfaction_Rating 1.082 Completed 0.000 dtype: float64
AGE
np.float64(0.417360783162141)
mean 25.335948 median 25.000000 Name: Age, dtype: float64
Age 17.0 9145 25.0 6649 26.0 6616 24.0 6460 27.0 6325 28.0 6168 23.0 6043 29.0 5650 22.0 5507 30.0 4973 21.0 4919 20.0 4427 31.0 4384 19.0 3731 33.0 3123 18.0 3122 34.0 2410 35.0 1869 36.0 1393 38.0 769 39.0 564 40.0 373 41.0 237 42.0 147 43.0 85 44.0 68 45.0 35 46.0 24 48.0 5 49.0 4 52.0 1 Name: count, dtype: int64
Inference
- We can say that the above age is bit right skewed and we see lot of learners from 17 to 30 age group
0
Login_Frequency
np.float64(0.42700657919129203)
mean 4.77703 median 5.00000 Name: Login_Frequency, dtype: float64
Login_Frequency 4.0 21566 5.0 20514 3.0 16089 6.0 15306 7.0 9209 2.0 7677 8.0 4630 9.0 2110 1.0 1706 10.0 776 11.0 251 0.0 43 14.0 8 15.0 3 Name: count, dtype: int64
0
Discussion_Participation
np.float64(0.625828033057328)
mean 2.283619 median 2.000000 Name: Discussion_Participation, dtype: float64
Discussion_Participation 2.0 25525 1.0 22936 3.0 19547 4.0 11710 0.0 10745 5.0 5718 6.0 2453 8.0 307 9.0 98 11.0 4 12.0 2 Name: count, dtype: int64
0
Assignments_Submitted
np.float64(-0.04468387727492165)
mean 4.734826 median 5.000000 Name: Assignments_Submitted, dtype: float64
np.float64(5.0)
Assignments_Submitted 5.0 23100 4.0 21248 6.0 17972 3.0 14098 7.0 10064 2.0 6457 8.0 4000 1.0 1755 0.0 230 10.0 124 Name: count, dtype: int64
Assignments_Submitted 5.0 23335 4.0 21463 6.0 18130 3.0 14243 7.0 10151 2.0 6518 8.0 4036 1.0 1769 0.0 231 10.0 124 Name: count, dtype: int64
0
Progress_Percentage
np.float64(-0.04771359142045723)
mean 53.827156 median 54.000000 Name: Progress_Percentage, dtype: float64
Progress_Percentage
56.7 361
53.8 349
57.7 346
53.6 336
54.9 333
...
90.7 1
91.6 1
17.8 1
10.6 1
93.2 1
Name: count, Length: 822, dtype: int64
0
Rewatch_Count
np.float64(0.6820974783722953)
mean 2.229492 median 2.000000 Name: Rewatch_Count, dtype: float64
Rewatch_Count 2.0 25704 1.0 22993 3.0 19694 4.0 11588 0.0 10619 5.0 5694 7.0 866 8.0 260 9.0 96 11.0 6 12.0 2 13.0 1 15.0 1 Name: count, dtype: int64
0
App_Usage_Percentage
np.float64(-0.2466866425852851)
mean 67.876098 median 68.000000 Name: App_Usage_Percentage, dtype: float64
App_Usage_Percentage
100.0 6127
72.0 2086
68.0 2044
65.0 1961
64.0 1949
...
5.0 21
7.0 18
4.0 15
3.0 9
2.0 8
Name: count, Length: 99, dtype: int64
0
Satisfaction_Rating
np.float64(-0.6078686559974532)
mean 4.147066 median 4.200000 Name: Satisfaction_Rating, dtype: float64
Satisfaction_Rating 5.0 17616 4.2 4914 4.3 4862 4.4 4858 4.1 4835 4.0 4796 4.5 4637 3.9 4563 4.6 4489 3.8 4233 4.7 4069 3.7 4042 3.6 3778 4.8 3772 3.5 3450 4.9 3443 3.4 3085 3.3 2581 3.2 2279 3.1 1964 3.0 1566 2.9 1343 2.7 880 2.6 746 2.5 530 2.4 430 2.3 329 2.2 279 2.1 192 2.0 140 1.9 92 1.8 59 1.6 24 1.5 18 1.3 10 1.4 6 1.0 6 1.1 1 1.2 1 Name: count, dtype: int64
0
np.int64(0)
Outlier Detection¶
Inference
- Time_Spent_Hours:
- Days_Since_Last_Login
- Quiz_Score_Avg
- Project_Grade
- Progress_Percentage
The above mentioned columns has more number of outliers
Inference
- Age is right-skewed with most students aged 18–28.
- Course durations occur in fixed predefined values.
- Instructor ratings mostly fall between 4.0–4.9.
- Login frequency is low to moderate for most students.
- Session duration centers around ~40 minutes.
- Video completion rate is moderate for most students.
- Discussion participation is very low overall.
- Time spent is right-skewed, mostly 8–16 hours.
- Days since last login is heavily right-skewed.
- Notifications checked is low for most students.
- Peer interaction scores follow a mild bell-shaped pattern.
- Assignments submitted shows clustered counts.
- Assignments missed is low for most students.
- Quiz attempts vary but mostly remain under 10.
- Quiz score average forms a strong bell-curve.
- Project grades show a near-normal distribution.
- Progress percentage is normally distributed.
- Rewatch count is low for most students.
- Payment amounts cluster toward lower ranges.
- App usage percentage is moderately distributed.
- Reminder emails clicked is very low for most users.
- Support tickets raised is extremely low.
- Satisfaction rating is slightly right-skewed but mostly positive.
Skewness of the column Age is : 0.4365762039036042 Skewness of the column Course_Duration_Days is : 0.47089430960814793 Skewness of the column Instructor_Rating is : -0.40421274031438154 Skewness of the column Login_Frequency is : 0.426828197112078 Skewness of the column Average_Session_Duration_Min is : 0.019499390538360858 Skewness of the column Video_Completion_Rate is : -0.3644220917501344 Skewness of the column Discussion_Participation is : 0.6338066082585112 Skewness of the column Time_Spent_Hours is : 1.1146121667660545 Skewness of the column Days_Since_Last_Login is : 2.295673564912052 Skewness of the column Notifications_Checked is : 0.5083082227197634 Skewness of the column Peer_Interaction_Score is : -0.17442136310527653 Skewness of the column Assignments_Submitted is : -0.04404763707114829 Skewness of the column Assignments_Missed is : -0.059119619329094875 Skewness of the column Quiz_Attempts is : 0.5715377344295525 Skewness of the column Quiz_Score_Avg is : -0.09184007660386516 Skewness of the column Project_Grade is : -0.11886932931229066 Skewness of the column Progress_Percentage is : -0.04773031995606081 Skewness of the column Rewatch_Count is : 0.6821754556498142 Skewness of the column Payment_Amount is : -0.3692041551572178 Skewness of the column App_Usage_Percentage is : -0.24672377018863956 Skewness of the column Reminder_Emails_Clicked is : 0.7273282007663202 Skewness of the column Support_Tickets_Raised is : 1.1340954795148297 Skewness of the column Satisfaction_Rating is : -0.6070948605295641
1.Using IQR Method
18226
- We can see 18256 records of outliers
81774
Inference:
- We can see 81744 records of non-outliers
Univariate Analysis¶
Numeric¶
HistPlot
Inference
- Age is right-skewed with most students aged 18–28.
- Course durations occur in fixed predefined values.
- Instructor ratings mostly fall between 4.0–4.9.
- Login frequency is low to moderate for most students.
- Session duration centers around ~40 minutes.
- Video completion rate is moderate for most students.
- Discussion participation is very low overall.
- Time spent is right-skewed, mostly 8–16 hours.
- Days since last login is heavily right-skewed.
- Notifications checked is low for most students.
- Peer interaction scores follow a mild bell-shaped pattern.
- Assignments submitted shows clustered counts.
- Assignments missed is low for most students.
- Quiz attempts vary but mostly remain under 10.
- Quiz score average forms a strong bell-curve.
- Project grades show a near-normal distribution.
- Progress percentage is normally distributed.
- Rewatch count is low for most students.
- Payment amounts cluster toward lower ranges.
- App usage percentage is moderately distributed.
- Reminder emails clicked is very low for most users.
- Support tickets raised is extremely low.
- Satisfaction rating is slightly right-skewed but mostly positive.
DistPlot
Inference:
- We se any of the graph are skwed which indicates presence of Outliers
KDEPlot
Inferance:
- We se any of the graph are skwed which indicates presence of Outliers
Inferance:
- We presence of outliers in almost all the graphes expect in the columns:Course_duration_days,Instructer_rating,Vider_completion_rate,
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Age | 95226.0 | 25.335948 | 5.472524 | 17.0 | 21.0 | 25.0 | 29.0 | 52.0 |
| Course_Duration_Days | 100000.0 | 51.817300 | 20.324801 | 25.0 | 30.0 | 45.0 | 60.0 | 90.0 |
| Instructor_Rating | 100000.0 | 4.444478 | 0.202631 | 4.1 | 4.3 | 4.5 | 4.6 | 4.7 |
| Login_Frequency | 99888.0 | 4.777030 | 1.832363 | 0.0 | 3.0 | 5.0 | 6.0 | 15.0 |
| Average_Session_Duration_Min | 100000.0 | 33.878180 | 10.341964 | 5.0 | 27.0 | 34.0 | 41.0 | 81.0 |
| Video_Completion_Rate | 100000.0 | 62.174580 | 19.558126 | 5.0 | 48.5 | 64.0 | 77.5 | 99.9 |
| Discussion_Participation | 99045.0 | 2.283619 | 1.528586 | 0.0 | 1.0 | 2.0 | 3.0 | 12.0 |
| Time_Spent_Hours | 100000.0 | 3.873632 | 3.781185 | 0.5 | 0.5 | 2.7 | 6.2 | 25.6 |
| Days_Since_Last_Login | 100000.0 | 6.188860 | 6.982047 | 0.0 | 1.0 | 4.0 | 9.0 | 99.0 |
| Notifications_Checked | 100000.0 | 5.232110 | 2.401486 | 0.0 | 4.0 | 5.0 | 7.0 | 18.0 |
| Peer_Interaction_Score | 100000.0 | 6.294509 | 1.977552 | 0.0 | 4.9 | 6.3 | 7.7 | 10.0 |
| Assignments_Submitted | 99048.0 | 4.734826 | 1.620023 | 0.0 | 4.0 | 5.0 | 6.0 | 10.0 |
| Assignments_Missed | 100000.0 | 5.123450 | 1.692808 | 0.0 | 4.0 | 5.0 | 6.0 | 10.0 |
| Quiz_Attempts | 100000.0 | 3.772330 | 2.021276 | 0.0 | 2.0 | 4.0 | 5.0 | 16.0 |
| Quiz_Score_Avg | 100000.0 | 73.276201 | 12.552344 | 19.6 | 64.7 | 73.3 | 82.0 | 100.0 |
| Project_Grade | 100000.0 | 68.189534 | 15.312036 | 0.0 | 57.7 | 68.3 | 78.8 | 100.0 |
| Progress_Percentage | 99665.0 | 53.827156 | 12.515789 | 7.6 | 45.3 | 54.0 | 62.5 | 98.6 |
| Rewatch_Count | 97524.0 | 2.229492 | 1.482424 | 0.0 | 1.0 | 2.0 | 3.0 | 15.0 |
| Payment_Amount | 100000.0 | 3253.427120 | 2084.391775 | 0.0 | 1242.0 | 3715.0 | 4685.0 | 7149.0 |
| App_Usage_Percentage | 99974.0 | 67.876098 | 19.113137 | 0.0 | 55.0 | 68.0 | 82.0 | 100.0 |
| Reminder_Emails_Clicked | 100000.0 | 2.332650 | 1.584626 | 0.0 | 1.0 | 2.0 | 3.0 | 13.0 |
| Support_Tickets_Raised | 100000.0 | 0.870980 | 0.951569 | 0.0 | 0.0 | 1.0 | 1.0 | 8.0 |
| Satisfaction_Rating | 98918.0 | 4.147066 | 0.689646 | 1.0 | 3.7 | 4.2 | 4.8 | 5.0 |
Skewness of the column Age is : 0.4365762039036042 Skewness of the column Course_Duration_Days is : 0.47089430960814793 Skewness of the column Instructor_Rating is : -0.40421274031438154 Skewness of the column Login_Frequency is : 0.426828197112078 Skewness of the column Average_Session_Duration_Min is : 0.019499390538360858 Skewness of the column Video_Completion_Rate is : -0.3644220917501344 Skewness of the column Discussion_Participation is : 0.6338066082585112 Skewness of the column Time_Spent_Hours is : 1.1146121667660545 Skewness of the column Days_Since_Last_Login is : 2.295673564912052 Skewness of the column Notifications_Checked is : 0.5083082227197634 Skewness of the column Peer_Interaction_Score is : -0.17442136310527653 Skewness of the column Assignments_Submitted is : -0.04404763707114829 Skewness of the column Assignments_Missed is : -0.059119619329094875 Skewness of the column Quiz_Attempts is : 0.5715377344295525 Skewness of the column Quiz_Score_Avg is : -0.09184007660386516 Skewness of the column Project_Grade is : -0.11886932931229066 Skewness of the column Progress_Percentage is : -0.04773031995606081 Skewness of the column Rewatch_Count is : 0.6821754556498142 Skewness of the column Payment_Amount is : -0.3692041551572178 Skewness of the column App_Usage_Percentage is : -0.24672377018863956 Skewness of the column Reminder_Emails_Clicked is : 0.7273282007663202 Skewness of the column Support_Tickets_Raised is : 1.1340954795148297 Skewness of the column Satisfaction_Rating is : -0.6070948605295641
Kurtosis of the column Age is : 0.002370893226747217 Kurtosis of the column Course_Duration_Days is : -0.8230337824788738 Kurtosis of the column Instructor_Rating is : -1.2105761221919056 Kurtosis of the column Login_Frequency is : 0.08374155311262621 Kurtosis of the column Average_Session_Duration_Min is : -0.07360773413163768 Kurtosis of the column Video_Completion_Rate is : -0.5538516410283587 Kurtosis of the column Discussion_Participation is : 0.37972877691821516 Kurtosis of the column Time_Spent_Hours is : 0.7348942042977593 Kurtosis of the column Days_Since_Last_Login is : 8.583771205154857 Kurtosis of the column Notifications_Checked is : 0.3155318308388493 Kurtosis of the column Peer_Interaction_Score is : -0.33895835091637583 Kurtosis of the column Assignments_Submitted is : -0.34247000403151207 Kurtosis of the column Assignments_Missed is : -0.2223412316831186 Kurtosis of the column Quiz_Attempts is : 0.3395520800140801 Kurtosis of the column Quiz_Score_Avg is : -0.24490779684850494 Kurtosis of the column Project_Grade is : -0.2379706935237178 Kurtosis of the column Progress_Percentage is : -0.17402158704955362 Kurtosis of the column Rewatch_Count is : 0.8293874576727518 Kurtosis of the column Payment_Amount is : -1.0156739316560142 Kurtosis of the column App_Usage_Percentage is : -0.4018101772870635 Kurtosis of the column Reminder_Emails_Clicked is : 0.606826976067909 Kurtosis of the column Support_Tickets_Raised is : 1.3845232097782572 Kurtosis of the column Satisfaction_Rating is : -0.16736313719962626
Categorical¶
Unique values and number of unique values for categoric columns
Student_ID
<bound method Series.unique of 0 STU100000
1 STU100001
2 STU100002
3 STU100003
4 STU100004
...
99995 STU199995
99996 STU199996
99997 STU199997
99998 STU199998
99999 STU199999
Name: Student_ID, Length: 100000, dtype: object>
Number of unique values in the column Student_ID are : 100000
-----------------------------------------------------------------------------------------------
Name
<bound method Series.unique of 0 Vihaan Patel
1 Arjun Nair
2 Aditya Bhardwaj
3 Krishna Singh
4 Krishna Nair
...
99995 Neha Singh
99996 Kavya Nair
99997 Neha Nair
99998 Pooja Sharma
99999 Rahul Patel
Name: Name, Length: 100000, dtype: object>
Number of unique values in the column Name are : 300
-----------------------------------------------------------------------------------------------
Gender
<bound method Series.unique of 0 Male
1 Female
2 Female
3 Female
4 Female
...
99995 Female
99996 Female
99997 Male
99998 Female
99999 Male
Name: Gender, Length: 100000, dtype: object>
Number of unique values in the column Gender are : 3
-----------------------------------------------------------------------------------------------
Education_Level
<bound method Series.unique of 0 Diploma
1 Bachelor
2 Master
3 Diploma
4 Master
...
99995 Bachelor
99996 Bachelor
99997 Master
99998 Bachelor
99999 Diploma
Name: Education_Level, Length: 100000, dtype: object>
Number of unique values in the column Education_Level are : 5
-----------------------------------------------------------------------------------------------
Employment_Status
<bound method Series.unique of 0 Student
1 Student
2 Student
3 Employed
4 Self-Employed
...
99995 Student
99996 Self-Employed
99997 Employed
99998 Student
99999 Student
Name: Employment_Status, Length: 100000, dtype: object>
Number of unique values in the column Employment_Status are : 4
-----------------------------------------------------------------------------------------------
City
<bound method Series.unique of 0 Indore
1 Delhi
2 Chennai
3 Surat
4 Lucknow
...
99995 Hyderabad
99996 Delhi
99997 Ahmedabad
99998 Ahmedabad
99999 Bengaluru
Name: City, Length: 100000, dtype: object>
Number of unique values in the column City are : 15
-----------------------------------------------------------------------------------------------
Device_Type
<bound method Series.unique of 0 Laptop
1 Laptop
2 Mobile
3 Mobile
4 Laptop
...
99995 Mobile
99996 Mobile
99997 Laptop
99998 Mobile
99999 Laptop
Name: Device_Type, Length: 100000, dtype: object>
Number of unique values in the column Device_Type are : 3
-----------------------------------------------------------------------------------------------
Internet_Connection_Quality
<bound method Series.unique of 0 Medium
1 Low
2 Medium
3 High
4 Medium
...
99995 Medium
99996 High
99997 Medium
99998 High
99999 Low
Name: Internet_Connection_Quality, Length: 100000, dtype: object>
Number of unique values in the column Internet_Connection_Quality are : 3
-----------------------------------------------------------------------------------------------
Course_ID
<bound method Series.unique of 0 C102
1 C106
2 C101
3 C105
4 C106
...
99995 C104
99996 C104
99997 C107
99998 C104
99999 C105
Name: Course_ID, Length: 100000, dtype: object>
Number of unique values in the column Course_ID are : 8
-----------------------------------------------------------------------------------------------
Course_Name
<bound method Series.unique of 0 Data Analysis with Python
1 Machine Learning A-Z
2 Python Basics
3 UI/UX Design Fundamentals
4 Machine Learning A-Z
...
99995 Digital Marketing Essentials
99996 Digital Marketing Essentials
99997 Statistics for Data Science
99998 Digital Marketing Essentials
99999 UI/UX Design Fundamentals
Name: Course_Name, Length: 100000, dtype: object>
Number of unique values in the column Course_Name are : 8
-----------------------------------------------------------------------------------------------
Category
<bound method Series.unique of 0 Programming
1 Programming
2 Programming
3 Design
4 Programming
...
99995 Marketing
99996 Marketing
99997 Math
99998 Marketing
99999 Design
Name: Category, Length: 100000, dtype: object>
Number of unique values in the column Category are : 5
-----------------------------------------------------------------------------------------------
Course_Level
<bound method Series.unique of 0 Intermediate
1 Advanced
2 Beginner
3 Beginner
4 Advanced
...
99995 Beginner
99996 Beginner
99997 Intermediate
99998 Beginner
99999 Beginner
Name: Course_Level, Length: 100000, dtype: object>
Number of unique values in the column Course_Level are : 3
-----------------------------------------------------------------------------------------------
Enrollment_Date
<bound method Series.unique of 0 01-06-2024
1 27-04-2025
2 20-01-2024
3 13-05-2025
4 19-12-2024
...
99995 12-07-2024
99996 13-01-2024
99997 14-08-2024
99998 21-06-2025
99999 15-03-2024
Name: Enrollment_Date, Length: 100000, dtype: object>
Number of unique values in the column Enrollment_Date are : 721
-----------------------------------------------------------------------------------------------
Payment_Mode
<bound method Series.unique of 0 Scholarship
1 Credit Card
2 NetBanking
3 UPI
4 Debit Card
...
99995 UPI
99996 UPI
99997 UPI
99998 Credit Card
99999 Credit Card
Name: Payment_Mode, Length: 100000, dtype: object>
Number of unique values in the column Payment_Mode are : 6
-----------------------------------------------------------------------------------------------
Fee_Paid
<bound method Series.unique of 0 No
1 Yes
2 Yes
3 Yes
4 Yes
...
99995 Yes
99996 Yes
99997 Yes
99998 Yes
99999 Yes
Name: Fee_Paid, Length: 100000, dtype: object>
Number of unique values in the column Fee_Paid are : 2
-----------------------------------------------------------------------------------------------
Discount_Used
<bound method Series.unique of 0 No
1 No
2 No
3 No
4 Yes
...
99995 No
99996 No
99997 Yes
99998 No
99999 No
Name: Discount_Used, Length: 100000, dtype: object>
Number of unique values in the column Discount_Used are : 2
-----------------------------------------------------------------------------------------------
Completed
<bound method Series.unique of 0 Completed
1 Not Completed
2 Completed
3 Completed
4 Completed
...
99995 Completed
99996 Not Completed
99997 Not Completed
99998 Not Completed
99999 Not Completed
Name: Completed, Length: 100000, dtype: object>
Number of unique values in the column Completed are : 2
-----------------------------------------------------------------------------------------------
Value counts for categorical
Student_ID
Student_ID
STU199960 1
STU199961 1
STU199962 1
STU199963 1
STU199964 1
..
STU100003 1
STU100004 1
STU100005 1
STU100006 1
STU100007 1
Name: count, Length: 100000, dtype: int64
-----------------------------------------------------------------------------------------------
Name
Name
Ritika Iyer 398
Arjun Joshi 387
Sakshi Iyer 384
Ritika Reddy 381
Sneha Patel 375
...
Ritika Singh 300
Aarav Desai 299
Aarav Bose 297
Krishna Mehta 295
Sai Verma 280
Name: count, Length: 300, dtype: int64
-----------------------------------------------------------------------------------------------
Gender
Gender
Female 50187
Male 47819
Other 1994
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Education_Level
Education_Level
Bachelor 54956
Master 21859
HighSchool 10107
Diploma 10032
PhD 3046
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Employment_Status
Employment_Status
Employed 45091
Student 44929
Self-Employed 5067
Unemployed 4913
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
City
City
Indore 6747
Delhi 6743
Mumbai 6731
Surat 6722
Nagpur 6707
Bengaluru 6707
Kolkata 6700
Lucknow 6675
Hyderabad 6652
Ahmedabad 6634
Bhopal 6625
Vadodara 6621
Jaipur 6613
Pune 6574
Chennai 6549
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Device_Type
Device_Type
Mobile 60021
Laptop 35018
Tablet 4961
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Internet_Connection_Quality
Internet_Connection_Quality
Medium 49985
High 35002
Low 15013
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Course_ID
Course_ID
C101 16807
C102 15465
C103 12702
C104 12686
C108 12631
C106 11198
C107 10025
C105 8486
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Course_Name
Course_Name
Python Basics 16807
Data Analysis with Python 15465
Introduction to AI 12702
Digital Marketing Essentials 12686
Excel for Business 12631
Machine Learning A-Z 11198
Statistics for Data Science 10025
UI/UX Design Fundamentals 8486
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Category
Category
Programming 56172
Marketing 12686
Business 12631
Math 10025
Design 8486
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Course_Level
Course_Level
Beginner 50610
Intermediate 38192
Advanced 11198
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Enrollment_Date
Enrollment_Date
13-07-2025 177
10-05-2025 175
20-10-2024 174
21-11-2023 174
24-12-2023 172
...
19-02-2025 109
30-05-2024 109
31-01-2024 109
07-04-2024 107
24-09-2024 102
Name: count, Length: 721, dtype: int64
-----------------------------------------------------------------------------------------------
Payment_Mode
Payment_Mode
UPI 30095
Free 19811
Credit Card 15129
Debit Card 14840
NetBanking 10086
Scholarship 10039
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Fee_Paid
Fee_Paid
Yes 70150
No 29850
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Discount_Used
Discount_Used
No 80058
Yes 19942
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Completed
Completed
Not Completed 50970
Completed 49030
Name: count, dtype: int64
-----------------------------------------------------------------------------------------------
Pie Chart
Inferance
- Females form the majority of the student population.
- Most students hold a Bachelor's degree.
- Employment status is dominated by Students, followed by Employed individuals.
- City distribution is diverse, with no single city heavily dominating.
- Mobile devices are the most commonly used for accessing the platform.
- Internet quality is mostly Medium for users.
- Course enrollments are fairly balanced across available Course_IDs.
- Popular courses include Python, Machine Learning, and AI-related subjects.
- Programming is the most common course category.
- Most learners enroll in Beginner-level courses.
- Payment modes are widely distributed, with UPI used most frequently.
- A majority of users have paid their course fees.
- Most students do not use discounts.
- Course completion is nearly balanced between completed and not completed.
Count Plot
Inference
- There are more female learners when compared to make learners
- Most of the learners have bachelor as their maximum education level
- Most of the learners are students as well as employed persons
- The most preferred course is python basics and the least preferred course is UI/UX Design Fundamentals
- Majority of the learners have chosen the programming category
- Half of the learners have chosen beginner course level.
- More than 75% of the learners have paid their fees and they haven't availed any discounts
- More than half of the learners haven't successfully completed their course.
Bivariate Analysis¶
Numeric vs Numeric¶
Scatter plot
['Age', 'Course_Duration_Days', 'Instructor_Rating', 'Login_Frequency', 'Average_Session_Duration_Min', 'Video_Completion_Rate', 'Discussion_Participation', 'Time_Spent_Hours', 'Days_Since_Last_Login', 'Notifications_Checked', 'Peer_Interaction_Score', 'Assignments_Submitted', 'Assignments_Missed', 'Quiz_Attempts', 'Quiz_Score_Avg', 'Project_Grade', 'Progress_Percentage', 'Rewatch_Count', 'Payment_Amount', 'App_Usage_Percentage', 'Reminder_Emails_Clicked', 'Support_Tickets_Raised', 'Satisfaction_Rating']
Inference
Video Completion Rate vs Progress Percentage
There is a clear positive trend — students who watch more videos also have higher overall course progress.Assignments Submitted vs Progress Percentage
Students who submit more assignments tend to show better progress in the course.Time Spent vs Progress Percentage
Learners who spend more hours on the platform generally progress more, though the relation is not perfectly linear.Quiz Score Avg vs Video Completion Rate
Higher video completion is linked with slightly better quiz scores, showing that consistent video learning helps performance.Project Grade vs Assignments Submitted
Students who submit more assignments usually achieve better project grades.Satisfaction Rating vs Instructor Rating
Higher instructor ratings are associated with higher student satisfaction, indicating instructor quality impacts user experience.Age vs Time Spent
Age does not show a strong relationship with time spent; learners across ages behave similarly.Fee Paid vs Completion
Students who paid the fee tend to complete the course more often than free learners.
Inference:
In The below We can see the Highly Corelated columns:
- Instructor_Rating vs Corse_Duration_Days(0.64)
- Video_Completion_Rate vs Progress_Percentage(0.59)
- Progress_Percentage vs Assignmets_Submitted(0.83)
- Assignments_Missed vs Progress_Percentage(-0.82)
- Assignments_Missed vs Assignments_Submitted
- So the above menctnoid columns are Highly corelated
Inference
Video_Completion_Rate ↔ Progress_Percentage
Strong positive relationship. Higher video completion leads to higher course progress.Assignments_Submitted ↔ Progress_Percentage
More submitted assignments result in higher overall progress.Assignments_Missed ↔ Progress_Percentage
Students who miss more assignments show lower progress.Assignments_Submitted ↔ Assignments_Missed
Clear inverse pattern—students who submit many assignments miss fewer.Video_Completion_Rate ↔ Assignments_Submitted
Students completing more videos tend to submit more assignments (higher engagement).Course_Duration_Days ↔ Progress_Percentage
Longer courses generally show lower or more scattered progress (possible difficulty/fatigue effect).
Inference Strong Positive Correlation
- Video completion and progress percentage
- Assignment submitted| and progress percentage
- Course Duration Days and Instructor Rating
Strong Negative Correlation
- Progress percentage and assignment missed
Numeric vs Categoric¶
Inference
Satisfaction rating is slightly higher for intermediate learners and for students with a Master’s education. Some courses also show better satisfaction than others.
App usage is higher among students with a Master’s degree and slightly higher for females. Programming courses show the highest app usage.
Project grades are generally better for male students and improve as education level increases. Intermediate level students also score better.
Quiz scores are higher for students with higher education, especially Master-level. Programming courses also show better quiz scores than business-related ones.
Students with higher education, especially up to Master’s level, submit more assignments. Programming courses again show the highest submission levels.
Female learners spend more time on the platform. Time spent is highest at the intermediate course level and increases with education level up to Master’s.
Video completion rate improves with higher education levels. Programming courses and intermediate-level learners show the best completion rates.
Course duration is generally higher for students with higher education and for programming-related courses.
Age slightly varies across groups. Intermediate-level learners tend to be a bit older, and the “Other” gender category shows slightly higher age on average.
Categoric vs Categoric¶
Gender vs Education_Level
| Gender | Female | Male | Other |
|---|---|---|---|
| Education_Level | |||
| Bachelor | 27646 | 26220 | 1090 |
| Diploma | 5088 | 4741 | 203 |
| HighSchool | 5028 | 4880 | 199 |
| Master | 10962 | 10462 | 435 |
| PhD | 1463 | 1516 | 67 |
Inference:
- We see lot people who have done their Bachelors and slightly females are dominating
Gender vs Employment_Status
| Gender | Female | Male | Other |
|---|---|---|---|
| Employment_Status | |||
| Employed | 22721 | 21484 | 886 |
| Self-Employed | 2493 | 2486 | 88 |
| Student | 22560 | 21451 | 918 |
| Unemployed | 2413 | 2398 | 102 |
Inference:
- We see lot of Students and people who are Employed pursuing this course
Gender vs Completed
| Gender | Female | Male | Other |
|---|---|---|---|
| Completed | |||
| Completed | 24794 | 23289 | 947 |
| Not Completed | 25393 | 24530 | 1047 |
Inferance:
- We see amount of people who have completed and not completed are almost equal while consurding the gender Females are dominating
Education_Level VS Employment_Status
| Education_Level | Bachelor | Diploma | HighSchool | Master | PhD |
|---|---|---|---|---|---|
| Employment_Status | |||||
| Employed | 24812 | 4537 | 4466 | 9854 | 1422 |
| Self-Employed | 2834 | 470 | 533 | 1119 | 111 |
| Student | 24605 | 4560 | 4595 | 9803 | 1366 |
| Unemployed | 2705 | 465 | 513 | 1083 | 147 |
Inference:
- Bachelor’s degree holders dominate across all employment categories, with Students and Employed groups having the highest overall counts compared to Self-Employed and Unemployed individuals.
Completed VS Course_Name
| Course_Name | Data Analysis with Python | Digital Marketing Essentials | Excel for Business | Introduction to AI | Machine Learning A-Z | Python Basics | Statistics for Data Science | UI/UX Design Fundamentals |
|---|---|---|---|---|---|---|---|---|
| Completed | ||||||||
| Completed | 7553 | 6175 | 6206 | 6242 | 5509 | 8261 | 4932 | 4152 |
| Not Completed | 7912 | 6511 | 6425 | 6460 | 5689 | 8546 | 5093 | 4334 |
Inference:
- We see lot of people chossing 'Pytho_Basics' and 'Data Analysis with python'
Completed VS Category
| Category | Business | Design | Marketing | Math | Programming |
|---|---|---|---|---|---|
| Completed | |||||
| Completed | 6206 | 4152 | 6175 | 4932 | 27565 |
| Not Completed | 6425 | 4334 | 6511 | 5093 | 28607 |
Inferance:
- We see more people opting Programming and followed by Marketing category.
Course_Level VS Completed
| Course_Level | Advanced | Beginner | Intermediate |
|---|---|---|---|
| Completed | |||
| Completed | 5509 | 24794 | 18727 |
| Not Completed | 5689 | 25816 | 19465 |
Inference:
- We see people doing lot of Beginner and Intermediate Corse_levels and less Advanced Course_level
Target Variable Analysis¶
Inference:
- We see almost the target variable is equal with all the columns
Inference
Completed learners spend more time on the course.
They log more hours and watch more videos compared to non-completers.Video Completion Rate is strongly higher for completed students.
This is one of the clearest indicators of completion.Assignments Submitted is much higher for students who completed.
Completing more assignments seems necessary for finishing the course.Students who completed also participate more in discussions.
More engagement → higher chance of completion.Instructor Interaction Score is higher among completed users.
Students who ask more questions or interact more tend to succeed.Quiz Score Average and Project Grades are higher for completed users.
Better performance goes hand-in-hand with finishing the course.Progress Percentage is obviously much higher for completed users.
This is expected, but the difference is large and clear.App Usage Percentage is higher for students who completed.
They use the learning app more consistently.Payment Amount is slightly higher for completed users.
This may indicate commitment: paid students finish more often.Reminder Emails Clicked is higher for completers.
They respond more to reminders and stay on track.Support Tickets Raised is slightly higher among completers.
They seek help when needed, which may prevent drop-off.Age and Instructor Rating don't show significant differences.
These factors do not strongly affect whether a learner completes.
Inference
1.There are quite few more learners both in male and female gender 2.
Stats Test¶
Chi Square test for independence¶
Hypothesis: Ho: Student_ID and Completed are independent Ha: Student_ID and Completed are dependent Conclusion: Fails to reject Ho Ho: Student_ID and Completed are independent ------------------ Hypothesis: Ho: Name and Completed are independent Ha: Name and Completed are dependent Conclusion: Fails to reject Ho Ho: Name and Completed are independent ------------------ Hypothesis: Ho: Gender and Completed are independent Ha: Gender and Completed are dependent Conclusion: Reject Ho Ha: Gender and Completed are dependent ------------------ Hypothesis: Ho: Education_Level and Completed are independent Ha: Education_Level and Completed are dependent Conclusion: Reject Ho Ha: Education_Level and Completed are dependent ------------------ Hypothesis: Ho: Employment_Status and Completed are independent Ha: Employment_Status and Completed are dependent Conclusion: Reject Ho Ha: Employment_Status and Completed are dependent ------------------ Hypothesis: Ho: City and Completed are independent Ha: City and Completed are dependent Conclusion: Fails to reject Ho Ho: City and Completed are independent ------------------ Hypothesis: Ho: Device_Type and Completed are independent Ha: Device_Type and Completed are dependent Conclusion: Reject Ho Ha: Device_Type and Completed are dependent ------------------ Hypothesis: Ho: Internet_Connection_Quality and Completed are independent Ha: Internet_Connection_Quality and Completed are dependent Conclusion: Reject Ho Ha: Internet_Connection_Quality and Completed are dependent ------------------ Hypothesis: Ho: Course_ID and Completed are independent Ha: Course_ID and Completed are dependent Conclusion: Fails to reject Ho Ho: Course_ID and Completed are independent ------------------ Hypothesis: Ho: Course_Name and Completed are independent Ha: Course_Name and Completed are dependent Conclusion: Fails to reject Ho Ho: Course_Name and Completed are independent ------------------ Hypothesis: Ho: Category and Completed are independent Ha: Category and Completed are dependent Conclusion: Fails to reject Ho Ho: Category and Completed are independent ------------------ Hypothesis: Ho: Course_Level and Completed are independent Ha: Course_Level and Completed are dependent Conclusion: Fails to reject Ho Ho: Course_Level and Completed are independent ------------------ Hypothesis: Ho: Enrollment_Date and Completed are independent Ha: Enrollment_Date and Completed are dependent Conclusion: Fails to reject Ho Ho: Enrollment_Date and Completed are independent ------------------ Hypothesis: Ho: Payment_Mode and Completed are independent Ha: Payment_Mode and Completed are dependent Conclusion: Reject Ho Ha: Payment_Mode and Completed are dependent ------------------ Hypothesis: Ho: Fee_Paid and Completed are independent Ha: Fee_Paid and Completed are dependent Conclusion: Reject Ho Ha: Fee_Paid and Completed are dependent ------------------ Hypothesis: Ho: Discount_Used and Completed are independent Ha: Discount_Used and Completed are dependent Conclusion: Fails to reject Ho Ho: Discount_Used and Completed are independent ------------------
Inference:
- Gender and Completed are dependent
- Education_Level and Completed are dependent
- Employment_Status and Completed are dependent
- Device_Type and Completed are dependent
- Internet_Connection_Quality and Completed are dependent
- Payment_Mode and Completed are dependent
- Fee_Paid and Completed are dependent
T-Test¶
Independent two tail t-test
Assumptions:
- N>30,pop std is not known so it is a t-test
- We are assuming it as a two tail t-test
- “We use a two-tailed t-test because we are testing for any significant difference between the group means, without assuming whether one group’s mean is higher or lower.”
- Out of the 23 numeric variables,we manually picked some of the most significant variables that we can perform the t-test
- The numerical columns are 'Course_Duration_Days','Login_Frequency','Average_Session_Duration_Min','Video_Completion_Rate', 'Discussion_Participation','Time_Spent_Hours','Days_Since_Last_Login','Assignments_Submitted','Quiz_Score_Avg', 'Progress_Percentage'
50970
Hypothesis: Ho : Mean of the Course_Duration_Days is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Course_Duration_Days is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Fails to reject Ho Ho : Mean of the Course_Duration_Days is SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Login_Frequency is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Login_Frequency is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Login_Frequency is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Average_Session_Duration_Min is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Average_Session_Duration_Min is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Average_Session_Duration_Min is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Video_Completion_Rate is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Video_Completion_Rate is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Video_Completion_Rate is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Discussion_Participation is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Discussion_Participation is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Discussion_Participation is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Time_Spent_Hours is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Time_Spent_Hours is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Time_Spent_Hours is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Days_Since_Last_Login is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Days_Since_Last_Login is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Days_Since_Last_Login is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Assignments_Submitted is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Assignments_Submitted is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Assignments_Submitted is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Quiz_Score_Avg is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Quiz_Score_Avg is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Quiz_Score_Avg is NOT SAME for 'Completed' and 'Not Completed' ------------------ Hypothesis: Ho : Mean of the Progress_Percentage is SAME for 'Completed' and 'Not Completed'(μ(Completed) = μ(Not Completed)) Ha : Mean of the Progress_Percentage is NOT SAME for 'Completed' and 'Not Completed'(μ(Completed) != μ(Not Completed)) CONCLUSION:Reject Ho Ha : Mean of the Progress_Percentage is NOT SAME for 'Completed' and 'Not Completed' ------------------
Inference
Only Mean of the Course_Duration_Days is SAME for 'Completed' and 'Not Completed'
Where as the means of these Login_Frequency','Average_Session_Duration_Min','Video_Completion_Rate','Discussion_Participation', 'Time_Spent_Hours','Days_Since_Last_Login','Assignments_Submitted','Quiz_Score_Avg','Progress_Percentage' are not same for completed and not completed
Encoding¶
Student_ID ['STU100000' 'STU100001' 'STU100002' ... 'STU199997' 'STU199998' 'STU199999'] ------------------------------------------------------------------------- Name ['Vihaan Patel' 'Arjun Nair' 'Aditya Bhardwaj' 'Krishna Singh' 'Krishna Nair' 'Rohan Reddy' 'Sai Nair' 'Krishna Desai' 'Vihaan Joshi' 'Vivaan Nair' 'Aditya Gupta' 'Sneha Bhardwaj' 'Rohan Desai' 'Vihaan Mehta' 'Sai Joshi' 'Sai Reddy' 'Rahul Singh' 'Rohan Shah' 'Neha Sharma' 'Pooja Singh' 'Kavya Verma' 'Rahul Kumar' 'Priya Joshi' 'Ritika Reddy' 'Arjun Iyer' 'Sakshi Sharma' 'Ritika Kumar' 'Vihaan Iyer' 'Pooja Sharma' 'Aarav Nair' 'Sai Gupta' 'Ananya Shah' 'Arjun Joshi' 'Aarav Bhardwaj' 'Sneha Bose' 'Vivaan Singh' 'Aditya Nair' 'Arjun Verma' 'Sai Kumar' 'Krishna Bose' 'Sneha Reddy' 'Rohan Singh' 'Krishna Gupta' 'Nikhil Reddy' 'Krishna Sharma' 'Isha Mehta' 'Isha Verma' 'Kavya Desai' 'Vivaan Iyer' 'Rohan Verma' 'Priya Bose' 'Priya Kumar' 'Ananya Mehta' 'Meera Verma' 'Nikhil Nair' 'Vivaan Patel' 'Sai Patel' 'Priya Sharma' 'Sakshi Patel' 'Nikhil Bhardwaj' 'Krishna Iyer' 'Priya Verma' 'Pooja Bose' 'Kavya Bhardwaj' 'Aditya Bose' 'Kavya Shah' 'Ritika Patel' 'Vihaan Bose' 'Rahul Verma' 'Aditya Iyer' 'Aarav Joshi' 'Rahul Patel' 'Arjun Gupta' 'Sneha Singh' 'Sneha Desai' 'Sakshi Nair' 'Sai Desai' 'Aarav Verma' 'Rohan Kumar' 'Priya Shah' 'Krishna Shah' 'Pooja Kumar' 'Sneha Iyer' 'Pooja Patel' 'Sai Bhardwaj' 'Isha Joshi' 'Sakshi Gupta' 'Krishna Kumar' 'Ritika Bose' 'Krishna Joshi' 'Pooja Bhardwaj' 'Ananya Gupta' 'Priya Nair' 'Isha Iyer' 'Sakshi Bhardwaj' 'Ananya Kumar' 'Sakshi Iyer' 'Aditya Kumar' 'Ananya Singh' 'Kavya Mehta' 'Priya Reddy' 'Aarav Bose' 'Isha Singh' 'Rahul Bhardwaj' 'Sakshi Shah' 'Rahul Reddy' 'Kavya Kumar' 'Neha Desai' 'Aditya Verma' 'Pooja Gupta' 'Aarav Shah' 'Meera Patel' 'Isha Gupta' 'Aarav Mehta' 'Sai Shah' 'Aarav Sharma' 'Kavya Joshi' 'Rohan Iyer' 'Vivaan Desai' 'Vihaan Bhardwaj' 'Sneha Verma' 'Nikhil Iyer' 'Rohan Sharma' 'Sakshi Desai' 'Vivaan Joshi' 'Meera Sharma' 'Kavya Nair' 'Neha Bhardwaj' 'Pooja Shah' 'Neha Bose' 'Meera Shah' 'Neha Gupta' 'Aditya Singh' 'Rohan Nair' 'Rahul Joshi' 'Sai Singh' 'Nikhil Desai' 'Vivaan Gupta' 'Meera Reddy' 'Neha Shah' 'Vivaan Verma' 'Isha Kumar' 'Vihaan Desai' 'Meera Kumar' 'Kavya Sharma' 'Vihaan Gupta' 'Arjun Sharma' 'Sakshi Kumar' 'Meera Gupta' 'Aditya Reddy' 'Vihaan Shah' 'Vivaan Reddy' 'Kavya Bose' 'Vihaan Reddy' 'Meera Bose' 'Isha Shah' 'Kavya Gupta' 'Ritika Nair' 'Rahul Mehta' 'Rohan Bhardwaj' 'Priya Iyer' 'Meera Desai' 'Sneha Mehta' 'Kavya Iyer' 'Nikhil Kumar' 'Neha Reddy' 'Vivaan Bose' 'Krishna Bhardwaj' 'Vivaan Kumar' 'Sneha Nair' 'Pooja Nair' 'Rahul Bose' 'Ananya Nair' 'Priya Patel' 'Vihaan Verma' 'Arjun Mehta' 'Kavya Reddy' 'Sneha Gupta' 'Vivaan Mehta' 'Arjun Desai' 'Pooja Desai' 'Aarav Kumar' 'Nikhil Joshi' 'Aditya Joshi' 'Priya Bhardwaj' 'Sakshi Mehta' 'Rahul Nair' 'Rohan Mehta' 'Isha Nair' 'Neha Joshi' 'Aditya Sharma' 'Vihaan Nair' 'Pooja Iyer' 'Krishna Verma' 'Sneha Shah' 'Nikhil Singh' 'Sakshi Bose' 'Pooja Joshi' 'Isha Reddy' 'Sakshi Reddy' 'Ananya Joshi' 'Sai Verma' 'Ritika Gupta' 'Aditya Mehta' 'Aarav Reddy' 'Nikhil Mehta' 'Sneha Sharma' 'Isha Bhardwaj' 'Arjun Singh' 'Sakshi Joshi' 'Arjun Reddy' 'Ananya Desai' 'Ritika Sharma' 'Nikhil Gupta' 'Sneha Joshi' 'Arjun Bose' 'Sneha Patel' 'Ritika Desai' 'Arjun Bhardwaj' 'Rohan Gupta' 'Ritika Mehta' 'Arjun Shah' 'Vihaan Kumar' 'Meera Nair' 'Neha Singh' 'Ananya Sharma' 'Rahul Gupta' 'Arjun Patel' 'Meera Iyer' 'Kavya Patel' 'Pooja Verma' 'Sakshi Verma' 'Ritika Joshi' 'Neha Nair' 'Isha Desai' 'Nikhil Verma' 'Ritika Verma' 'Rohan Joshi' 'Ritika Iyer' 'Aditya Patel' 'Ananya Reddy' 'Rahul Sharma' 'Meera Bhardwaj' 'Priya Gupta' 'Pooja Mehta' 'Pooja Reddy' 'Ananya Patel' 'Neha Verma' 'Aarav Gupta' 'Sai Mehta' 'Priya Desai' 'Neha Iyer' 'Sakshi Singh' 'Ritika Singh' 'Aarav Desai' 'Aarav Singh' 'Meera Singh' 'Vivaan Shah' 'Ananya Bose' 'Rahul Iyer' 'Vivaan Bhardwaj' 'Sai Sharma' 'Ananya Iyer' 'Vivaan Sharma' 'Nikhil Shah' 'Sai Bose' 'Aditya Desai' 'Krishna Reddy' 'Nikhil Sharma' 'Vihaan Singh' 'Meera Mehta' 'Rahul Shah' 'Ananya Bhardwaj' 'Ananya Verma' 'Vihaan Sharma' 'Krishna Patel' 'Rahul Desai' 'Isha Patel' 'Nikhil Patel' 'Ritika Bhardwaj' 'Neha Mehta' 'Meera Joshi' 'Isha Bose' 'Priya Singh' 'Rohan Bose' 'Ritika Shah' 'Aarav Iyer' 'Krishna Mehta' 'Rohan Patel' 'Priya Mehta' 'Sai Iyer' 'Aarav Patel' 'Arjun Kumar' 'Sneha Kumar' 'Neha Kumar' 'Aditya Shah' 'Kavya Singh' 'Nikhil Bose' 'Neha Patel' 'Isha Sharma'] ------------------------------------------------------------------------- Gender ['Male' 'Female' 'Other'] ------------------------------------------------------------------------- Education_Level ['Diploma' 'Bachelor' 'Master' 'HighSchool' 'PhD'] ------------------------------------------------------------------------- Employment_Status ['Student' 'Employed' 'Self-Employed' 'Unemployed'] ------------------------------------------------------------------------- City ['Indore' 'Delhi' 'Chennai' 'Surat' 'Lucknow' 'Jaipur' 'Hyderabad' 'Nagpur' 'Kolkata' 'Ahmedabad' 'Pune' 'Mumbai' 'Bengaluru' 'Bhopal' 'Vadodara'] ------------------------------------------------------------------------- Device_Type ['Laptop' 'Mobile' 'Tablet'] ------------------------------------------------------------------------- Internet_Connection_Quality ['Medium' 'Low' 'High'] ------------------------------------------------------------------------- Course_ID ['C102' 'C106' 'C101' 'C105' 'C103' 'C104' 'C107' 'C108'] ------------------------------------------------------------------------- Course_Name ['Data Analysis with Python' 'Machine Learning A-Z' 'Python Basics' 'UI/UX Design Fundamentals' 'Introduction to AI' 'Digital Marketing Essentials' 'Statistics for Data Science' 'Excel for Business'] ------------------------------------------------------------------------- Category ['Programming' 'Design' 'Marketing' 'Math' 'Business'] ------------------------------------------------------------------------- Course_Level ['Intermediate' 'Advanced' 'Beginner'] ------------------------------------------------------------------------- Enrollment_Date ['01-06-2024' '27-04-2025' '20-01-2024' '13-05-2025' '19-12-2024' '23-10-2023' '24-03-2024' '09-11-2024' '13-07-2024' '07-11-2024' '11-07-2025' '02-12-2023' '11-06-2024' '24-10-2024' '16-07-2024' '23-06-2024' '02-08-2025' '13-12-2023' '30-11-2024' '19-05-2025' '05-08-2025' '05-06-2025' '06-03-2024' '06-02-2024' '22-12-2024' '07-02-2025' '28-10-2023' '23-01-2025' '12-03-2025' '18-07-2025' '07-05-2024' '12-04-2025' '08-02-2024' '18-06-2024' '05-01-2024' '14-04-2025' '01-05-2025' '23-12-2024' '22-09-2025' '03-09-2025' '17-02-2024' '10-06-2025' '28-12-2023' '19-09-2024' '25-02-2025' '23-05-2025' '06-12-2024' '13-10-2024' '11-12-2023' '05-02-2024' '08-03-2024' '02-01-2024' '25-01-2024' '22-05-2024' '19-10-2023' '24-01-2025' '08-05-2025' '15-12-2023' '02-09-2024' '24-04-2024' '13-06-2024' '31-05-2025' '20-08-2025' '10-07-2024' '14-07-2024' '27-01-2025' '26-10-2023' '31-12-2023' '08-10-2024' '26-03-2024' '04-07-2025' '18-08-2024' '03-02-2024' '19-05-2024' '08-12-2023' '01-12-2023' '11-07-2024' '27-12-2024' '23-09-2025' '04-06-2025' '27-08-2024' '26-11-2023' '30-01-2024' '15-06-2024' '12-12-2024' '11-12-2024' '04-02-2024' '31-05-2024' '24-08-2024' '28-11-2024' '28-07-2024' '18-04-2025' '18-06-2025' '05-01-2025' '29-05-2025' '18-10-2023' '02-09-2025' '27-08-2025' '21-02-2024' '15-09-2024' '30-04-2024' '12-04-2024' '17-10-2023' '01-06-2025' '11-01-2025' '07-04-2024' '12-07-2025' '11-11-2023' '30-12-2023' '03-05-2024' '18-08-2025' '28-09-2025' '18-02-2024' '12-01-2025' '26-12-2024' '14-02-2025' '30-07-2025' '19-07-2025' '04-08-2024' '22-04-2025' '26-09-2025' '07-12-2024' '15-05-2024' '19-11-2024' '25-06-2025' '04-12-2024' '15-07-2025' '16-05-2025' '31-01-2025' '20-05-2024' '08-11-2024' '08-01-2024' '09-10-2024' '14-08-2025' '08-05-2024' '12-02-2025' '05-07-2024' '11-02-2024' '31-08-2024' '26-11-2024' '20-07-2025' '16-09-2025' '18-05-2025' '30-10-2024' '20-12-2024' '28-07-2025' '03-04-2025' '23-02-2025' '17-07-2024' '13-01-2024' '03-02-2025' '15-11-2023' '21-05-2024' '29-04-2024' '07-05-2025' '02-03-2025' '30-05-2024' '03-06-2025' '25-12-2024' '20-07-2024' '01-05-2024' '26-07-2024' '27-01-2024' '21-08-2024' '23-02-2024' '03-08-2024' '22-12-2023' '18-09-2025' '23-08-2024' '16-12-2024' '04-04-2025' '02-07-2025' '01-01-2024' '16-06-2025' '01-08-2024' '15-07-2024' '26-08-2024' '07-08-2024' '02-10-2025' '07-01-2025' '11-09-2024' '06-09-2025' '04-09-2025' '13-04-2024' '27-07-2024' '03-03-2025' '09-09-2024' '02-05-2025' '12-09-2024' '30-03-2025' '06-08-2024' '30-08-2024' '03-06-2024' '22-03-2024' '22-07-2024' '22-06-2024' '21-06-2025' '27-12-2023' '29-09-2024' '17-08-2024' '17-03-2024' '04-05-2025' '27-04-2024' '10-12-2024' '10-11-2024' '20-03-2024' '12-06-2025' '09-01-2025' '12-03-2024' '25-09-2025' '27-02-2025' '24-07-2025' '18-10-2024' '15-09-2025' '10-06-2024' '29-08-2025' '04-10-2024' '13-11-2024' '21-08-2025' '05-08-2024' '29-07-2025' '22-08-2025' '08-08-2025' '14-08-2024' '29-02-2024' '21-02-2025' '04-06-2024' '17-01-2024' '06-06-2025' '20-12-2023' '08-12-2024' '19-04-2025' '29-09-2025' '23-03-2024' '08-07-2025' '26-05-2024' '26-01-2024' '19-02-2024' '01-08-2025' '06-05-2025' '27-02-2024' '23-08-2025' '16-06-2024' '09-12-2023' '21-12-2023' '15-06-2025' '07-07-2025' '17-11-2023' '13-09-2024' '17-03-2025' '21-01-2025' '01-09-2024' '13-02-2025' '01-01-2025' '14-05-2025' '28-08-2025' '19-03-2025' '12-11-2023' '28-08-2024' '11-03-2024' '26-07-2025' '05-11-2024' '24-03-2025' '10-09-2025' '13-05-2024' '04-07-2024' '30-12-2024' '08-06-2024' '25-11-2024' '06-04-2024' '22-03-2025' '07-08-2025' '06-05-2024' '17-05-2025' '04-09-2024' '11-10-2024' '24-11-2023' '01-02-2025' '28-11-2023' '26-04-2024' '19-12-2023' '02-10-2024' '26-06-2024' '27-03-2024' '19-08-2025' '11-01-2024' '17-09-2024' '17-09-2025' '20-03-2025' '23-05-2024' '05-03-2024' '28-01-2024' '08-04-2025' '10-07-2025' '19-04-2024' '18-01-2025' '08-02-2025' '10-01-2025' '14-03-2025' '27-06-2025' '20-02-2025' '28-03-2025' '06-01-2025' '02-11-2024' '14-11-2023' '12-08-2025' '05-11-2023' '01-07-2025' '11-03-2025' '27-05-2025' '25-04-2024' '07-04-2025' '16-10-2024' '21-07-2025' '21-09-2024' '27-11-2024' '09-08-2025' '04-12-2023' '15-11-2024' '27-10-2024' '24-12-2024' '18-04-2024' '16-09-2024' '29-11-2023' '14-12-2023' '25-03-2025' '19-11-2023' '10-04-2024' '08-09-2024' '09-08-2024' '03-10-2025' '05-03-2025' '06-02-2025' '23-12-2023' '05-02-2025' '03-05-2025' '02-01-2025' '24-07-2024' '02-06-2025' '08-08-2024' '17-12-2023' '12-05-2024' '07-09-2024' '27-03-2025' '07-12-2023' '27-06-2024' '09-11-2023' '19-06-2024' '31-07-2024' '25-10-2024' '22-05-2025' '29-07-2024' '24-01-2024' '10-05-2025' '21-03-2025' '05-07-2025' '03-11-2023' '20-09-2025' '18-11-2023' '10-02-2025' '28-04-2025' '13-09-2025' '25-06-2024' '16-08-2024' '28-02-2025' '23-07-2024' '14-03-2024' '18-02-2025' '08-06-2025' '13-12-2024' '15-08-2025' '16-01-2024' '23-11-2023' '25-04-2025' '16-03-2025' '20-05-2025' '05-12-2024' '01-09-2025' '17-02-2025' '21-10-2024' '29-04-2025' '02-04-2024' '12-09-2025' '20-10-2023' '03-10-2024' '26-12-2023' '03-03-2024' '06-11-2024' '21-04-2025' '16-04-2024' '02-02-2025' '05-12-2023' '12-06-2024' '28-05-2025' '18-12-2023' '19-01-2025' '09-06-2025' '28-04-2024' '21-06-2024' '19-10-2024' '04-08-2025' '22-04-2024' '23-07-2025' '27-05-2024' '31-07-2025' '02-11-2023' '16-08-2025' '09-04-2024' '20-09-2024' '16-05-2024' '17-06-2025' '04-11-2024' '27-07-2025' '19-07-2024' '20-06-2025' '11-05-2025' '04-03-2025' '21-12-2024' '30-06-2024' '09-07-2025' '18-09-2024' '13-08-2025' '27-09-2025' '30-07-2024' '18-05-2024' '22-08-2024' '24-09-2025' '09-09-2025' '11-09-2025' '12-01-2024' '22-02-2025' '17-11-2024' '11-04-2024' '29-05-2024' '07-03-2025' '29-01-2024' '08-01-2025' '28-12-2024' '19-06-2025' '11-08-2024' '26-03-2025' '16-12-2023' '18-07-2024' '28-10-2024' '26-09-2024' '24-02-2025' '03-12-2023' '26-01-2025' '31-10-2024' '22-01-2024' '25-10-2023' '01-03-2024' '20-02-2024' '29-12-2024' '30-06-2025' '29-10-2023' '21-05-2025' '15-12-2024' '29-08-2024' '17-06-2024' '12-02-2024' '04-10-2025' '04-04-2024' '12-12-2023' '12-11-2024' '22-11-2023' '16-02-2024' '04-02-2025' '10-08-2024' '04-05-2024' '10-03-2024' '01-04-2025' '23-11-2024' '25-08-2025' '03-09-2024' '06-09-2024' '11-05-2024' '15-10-2024' '09-07-2024' '13-01-2025' '26-05-2025' '16-11-2023' '28-06-2024' '30-09-2025' '06-11-2023' '06-07-2025' '15-04-2024' '11-08-2025' '11-06-2025' '13-08-2024' '08-03-2025' '06-10-2024' '15-01-2024' '30-05-2025' '20-11-2023' '01-02-2024' '29-03-2024' '27-09-2024' '03-01-2025' '23-01-2024' '14-01-2024' '19-02-2025' '12-07-2024' '30-08-2025' '23-04-2025' '17-04-2025' '21-11-2023' '24-05-2025' '04-01-2025' '24-12-2023' '14-09-2024' '09-02-2024' '25-11-2023' '01-11-2024' '18-03-2025' '09-05-2024' '20-10-2024' '01-04-2024' '16-07-2025' '25-09-2024' '22-06-2025' '25-02-2024' '21-04-2024' '14-02-2024' '07-01-2024' '03-12-2024' '15-04-2025' '10-09-2024' '25-08-2024' '01-10-2024' '28-02-2024' '16-04-2025' '24-08-2025' '09-03-2025' '08-04-2024' '14-07-2025' '28-03-2024' '31-03-2025' '03-08-2025' '16-01-2025' '04-11-2023' '17-08-2025' '22-01-2025' '09-12-2024' '21-10-2023' '31-01-2024' '24-06-2025' '17-07-2025' '30-10-2023' '10-12-2023' '10-08-2025' '25-05-2025' '19-01-2024' '10-02-2024' '05-10-2025' '07-07-2024' '07-06-2025' '29-06-2025' '25-03-2024' '09-03-2024' '01-07-2024' '02-05-2024' '26-04-2025' '29-03-2025' '08-11-2023' '02-08-2024' '23-06-2025' '20-04-2024' '13-04-2025' '30-11-2023' '16-02-2025' '24-10-2023' '02-07-2024' '17-04-2024' '18-01-2024' '06-03-2025' '01-11-2023' '28-09-2024' '05-09-2025' '15-02-2025' '20-06-2024' '13-07-2025' '23-03-2025' '30-01-2025' '10-03-2025' '18-03-2024' '22-07-2025' '06-07-2024' '13-03-2024' '29-12-2023' '05-09-2024' '24-11-2024' '22-10-2024' '10-01-2024' '21-07-2024' '05-06-2024' '04-03-2024' '30-04-2025' '06-01-2024' '09-02-2025' '14-11-2024' '24-04-2025' '07-10-2024' '23-04-2024' '20-04-2025' '01-03-2025' '06-12-2023' '02-02-2024' '25-07-2024' '04-01-2024' '23-09-2024' '22-11-2024' '16-03-2024' '13-03-2025' '05-04-2025' '31-10-2023' '17-12-2024' '09-05-2025' '15-01-2025' '22-10-2023' '08-07-2024' '26-02-2024' '25-07-2025' '03-11-2024' '01-10-2025' '06-08-2025' '10-10-2024' '26-02-2025' '07-03-2024' '26-08-2025' '13-06-2025' '20-08-2024' '11-04-2025' '31-12-2024' '20-11-2024' '21-09-2025' '03-07-2025' '14-01-2025' '06-06-2024' '25-05-2024' '05-05-2025' '06-10-2025' '10-11-2023' '27-11-2023' '03-01-2024' '31-03-2024' '06-04-2025' '21-01-2024' '19-09-2025' '08-09-2025' '29-11-2024' '09-06-2024' '25-12-2023' '11-11-2024' '18-11-2024' '30-09-2024' '24-06-2024' '07-11-2023' '14-06-2024' '20-01-2025' '28-01-2025' '15-03-2025' '24-02-2024' '19-03-2024' '02-12-2024' '19-08-2024' '26-06-2025' '07-09-2025' '21-11-2024' '30-03-2024' '27-10-2023' '01-12-2024' '09-04-2025' '28-06-2025' '22-02-2024' '02-06-2024' '14-06-2025' '22-09-2024' '17-10-2024' '24-09-2024' '14-04-2024' '07-02-2024' '09-01-2024' '10-05-2024' '03-04-2024' '03-07-2024' '07-06-2024' '12-05-2025' '28-05-2024' '12-10-2024' '23-10-2024' '14-09-2025' '14-12-2024' '05-10-2024' '17-05-2024' '15-03-2024' '14-10-2024' '29-06-2024' '31-08-2025' '26-10-2024' '24-05-2024' '29-10-2024' '15-02-2024' '18-12-2024' '11-02-2025' '13-11-2023' '16-11-2024' '15-08-2024' '13-02-2024' '02-03-2024' '12-08-2024' '10-04-2025' '14-05-2024' '05-04-2024' '05-05-2024' '15-05-2025' '25-01-2025' '21-03-2024' '17-01-2025' '29-01-2025' '02-04-2025'] ------------------------------------------------------------------------- Payment_Mode ['Scholarship' 'Credit Card' 'NetBanking' 'UPI' 'Debit Card' 'Free'] ------------------------------------------------------------------------- Fee_Paid ['No' 'Yes'] ------------------------------------------------------------------------- Discount_Used ['No' 'Yes'] ------------------------------------------------------------------------- Completed ['Completed' 'Not Completed'] -------------------------------------------------------------------------
- n-1 dummy -> Fee_Paid,Discount_Used,Completed
- ordinal -- > Education_Level,Internet_Connection_Quality,Course_Level
- label -- > Geder,Employment_Status,City,Device_Type,Course_Name,Payment_Mode,Category
- drop-->Student_ID,Name,Course_ID,Enrollment_Date
['Age', 'Course_Duration_Days', 'Instructor_Rating', 'Login_Frequency', 'Average_Session_Duration_Min', 'Video_Completion_Rate', 'Discussion_Participation', 'Time_Spent_Hours', 'Days_Since_Last_Login', 'Notifications_Checked', 'Peer_Interaction_Score', 'Assignments_Submitted', 'Assignments_Missed', 'Quiz_Attempts', 'Quiz_Score_Avg', 'Project_Grade', 'Progress_Percentage', 'Rewatch_Count', 'Payment_Amount', 'App_Usage_Percentage', 'Reminder_Emails_Clicked', 'Support_Tickets_Raised', 'Satisfaction_Rating']
Dropping the columns¶
(100000, 36)
N-1 Dummy¶
| Fee_Paid | Discount_Used | Completed | |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 |
| 2 | 1 | 0 | 0 |
| 3 | 1 | 0 | 0 |
| 4 | 1 | 1 | 0 |
| ... | ... | ... | ... |
| 99995 | 1 | 0 | 0 |
| 99996 | 1 | 0 | 1 |
| 99997 | 1 | 1 | 1 |
| 99998 | 1 | 0 | 1 |
| 99999 | 1 | 0 | 1 |
100000 rows × 3 columns
Ordinal Encoding¶
Education_Level
array(['Diploma', 'Bachelor', 'Master', 'HighSchool', 'PhD'], dtype=object)
array([3., 2., 1., 4., 0.])
Internet_Connection_Quality
array(['Medium', 'Low', 'High'], dtype=object)
array([1., 2., 0.])
Course_Level
array(['Intermediate', 'Advanced', 'Beginner'], dtype=object)
array([1., 0., 2.])
Label Encoding¶
| Gender | Employment_Status | City | Device_Type | Course_Name | Payment_Mode | Category | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 6 | 0 | 0 | 4 | 4 |
| 1 | 0 | 2 | 4 | 0 | 4 | 0 | 4 |
| 2 | 0 | 2 | 3 | 1 | 5 | 3 | 4 |
| 3 | 0 | 0 | 13 | 1 | 7 | 5 | 1 |
| 4 | 0 | 1 | 9 | 0 | 4 | 1 | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 99995 | 0 | 2 | 5 | 1 | 1 | 5 | 2 |
| 99996 | 0 | 1 | 4 | 1 | 1 | 5 | 2 |
| 99997 | 1 | 0 | 0 | 0 | 6 | 5 | 3 |
| 99998 | 0 | 2 | 0 | 1 | 1 | 0 | 2 |
| 99999 | 1 | 2 | 1 | 0 | 7 | 0 | 1 |
100000 rows × 7 columns
Base Model¶
| Gender | Age | Education_Level | Employment_Status | City | Device_Type | Internet_Connection_Quality | Course_Name | Category | Course_Level | Course_Duration_Days | Instructor_Rating | Login_Frequency | Average_Session_Duration_Min | Video_Completion_Rate | Discussion_Participation | Time_Spent_Hours | Days_Since_Last_Login | Notifications_Checked | Peer_Interaction_Score | Assignments_Submitted | Assignments_Missed | Quiz_Attempts | Quiz_Score_Avg | Project_Grade | Progress_Percentage | Rewatch_Count | Payment_Mode | Fee_Paid | Discount_Used | Payment_Amount | App_Usage_Percentage | Reminder_Emails_Clicked | Support_Tickets_Raised | Satisfaction_Rating | Completed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 19.0 | 3.0 | 2 | 6 | 0 | 1.0 | 0 | 4 | 1.0 | 60 | 4.7 | 3.0 | 30 | 55.0 | 2.0 | 0.5 | 1 | 6 | 4.3 | 8.0 | 1 | 5 | 80.9 | 71.2 | 70.8 | 0.0 | 4 | 0 | 0 | 1740 | 49.0 | 3 | 4 | 3.5 | 0 |
| 1 | 0 | 17.0 | 2.0 | 2 | 4 | 0 | 2.0 | 4 | 4 | 0.0 | 90 | 4.6 | 4.0 | 37 | 84.1 | 2.0 | 0.9 | 3 | 5 | 7.8 | 4.0 | 6 | 3 | 78.4 | 42.5 | 55.6 | 2.0 | 0 | 1 | 0 | 6147 | 86.0 | 0 | 0 | 4.5 | 1 |
| 2 | 0 | 34.0 | 1.0 | 2 | 3 | 1 | 1.0 | 5 | 4 | 2.0 | 45 | 4.6 | 5.0 | 9 | 75.6 | 3.0 | 0.5 | 19 | 5 | 6.7 | 8.0 | 2 | 3 | 100.0 | 87.9 | 78.8 | 2.0 | 3 | 1 | 0 | 4280 | 85.0 | 1 | 0 | 5.0 | 0 |
| 3 | 0 | 29.0 | 3.0 | 0 | 13 | 1 | 0.0 | 7 | 1 | 2.0 | 40 | 4.4 | 2.0 | 27 | 63.3 | 1.0 | 7.4 | 19 | 9 | 6.4 | 0.0 | 10 | 4 | 59.1 | 51.4 | 24.7 | 4.0 | 5 | 1 | 0 | 3812 | 42.0 | 2 | 3 | 3.8 | 0 |
| 4 | 0 | 19.0 | 1.0 | 1 | 9 | 0 | 1.0 | 4 | 4 | 0.0 | 90 | 4.6 | 2.0 | 36 | 86.4 | 1.0 | 0.5 | 4 | 7 | 7.5 | 5.0 | 5 | 8 | 84.8 | 93.0 | 64.9 | 4.0 | 1 | 1 | 1 | 5486 | 91.0 | 3 | 0 | 4.0 | 0 |
Scaling¶
Robust Scaling
We have chossen Robust scaling because in our data set you see lot od presence of outliers
Adding Constant
Base Model - 1 - Logit¶
Optimization terminated successfully.
Current function value: 0.657366
Iterations 5
Logit Regression Results
==============================================================================
Dep. Variable: Completed No. Observations: 80000
Model: Logit Df Residuals: 79964
Method: MLE Df Model: 35
Date: Mon, 05 Jan 2026 Pseudo R-squ.: 0.05132
Time: 11:13:24 Log-Likelihood: -52589.
converged: True LL-Null: -55434.
Covariance Type: nonrobust LLR p-value: 0.000
================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------
const 0.1838 0.082 2.244 0.025 0.023 0.344
Gender 0.0247 0.014 1.815 0.070 -0.002 0.051
Age 0.0065 0.011 0.592 0.554 -0.015 0.028
Education_Level 0.0048 0.008 0.600 0.548 -0.011 0.020
Employment_Status 0.0080 0.007 1.119 0.263 -0.006 0.022
City 0.0010 0.002 0.585 0.558 -0.002 0.004
Device_Type -0.0050 0.013 -0.380 0.704 -0.031 0.021
Internet_Connection_Quality 0.0068 0.011 0.626 0.531 -0.015 0.028
Course_Name -0.0002 0.004 -0.056 0.956 -0.008 0.007
Category -0.0022 0.010 -0.210 0.834 -0.023 0.018
Course_Level -0.0141 0.034 -0.412 0.681 -0.081 0.053
Course_Duration_Days 0.0272 0.044 0.623 0.533 -0.058 0.113
Instructor_Rating 0.0198 0.023 0.860 0.390 -0.025 0.065
Login_Frequency 0.0136 0.013 1.088 0.277 -0.011 0.038
Average_Session_Duration_Min -0.0052 0.010 -0.511 0.610 -0.025 0.015
Video_Completion_Rate -0.2905 0.045 -6.407 0.000 -0.379 -0.202
Discussion_Participation -0.0049 0.010 -0.495 0.620 -0.024 0.014
Time_Spent_Hours -0.2717 0.012 -23.605 0.000 -0.294 -0.249
Days_Since_Last_Login 0.0759 0.009 8.879 0.000 0.059 0.093
Notifications_Checked -0.0034 0.009 -0.360 0.719 -0.022 0.015
Peer_Interaction_Score -0.0100 0.011 -0.947 0.344 -0.031 0.011
Assignments_Submitted -0.0144 0.034 -0.427 0.669 -0.081 0.052
Assignments_Missed 0.0332 0.045 0.742 0.458 -0.054 0.121
Quiz_Attempts 0.0246 0.011 2.212 0.027 0.003 0.046
Quiz_Score_Avg -0.1561 0.010 -15.039 0.000 -0.176 -0.136
Project_Grade 0.0050 0.011 0.443 0.658 -0.017 0.027
Progress_Percentage -0.3959 0.074 -5.318 0.000 -0.542 -0.250
Rewatch_Count -0.0087 0.010 -0.867 0.386 -0.028 0.011
Payment_Mode -0.0035 0.004 -0.865 0.387 -0.011 0.004
Fee_Paid -0.2111 0.051 -4.154 0.000 -0.311 -0.111
Discount_Used -0.0325 0.019 -1.692 0.091 -0.070 0.005
Payment_Amount -0.1705 0.041 -4.128 0.000 -0.252 -0.090
App_Usage_Percentage -0.0284 0.010 -2.722 0.006 -0.049 -0.008
Reminder_Emails_Clicked 0.0087 0.009 0.923 0.356 -0.010 0.027
Support_Tickets_Raised -0.0047 0.008 -0.602 0.547 -0.020 0.011
Satisfaction_Rating 0.0179 0.012 1.522 0.128 -0.005 0.041
================================================================================================
User def function for evaluation metrics¶
75721 0.423102
80184 0.329525
19864 0.632966
76699 0.314073
92991 0.310832
...
32595 0.461341
29313 0.310339
37862 0.528044
53421 0.460315
42410 0.515216
Length: 20000, dtype: float64
[0, 0, 1, 0, 0]
precision recall f1-score support
0 0.60 0.58 0.59 9862
1 0.60 0.62 0.61 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.62 0.612
75220 0.589794 48955 0.638164 44966 0.431316 13568 0.631952 92727 0.547524 dtype: float64
[1, 1, 0, 1, 1]
precision recall f1-score support
0 0.60 0.59 0.59 39168
1 0.61 0.63 0.62 40832
accuracy 0.61 80000
macro avg 0.61 0.61 0.61 80000
weighted avg 0.61 0.61 0.61 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
Checking the imbalance of the target variable¶
Completed 1 50970 0 49030 Name: count, dtype: int64
Inference:
- We see slight More of 'yes' while compared to 'No' Which is negligible
Model-2-KNN¶
KNeighborsClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsClassifier()
File "C:\Users\harsh\anaconda3\Lib\site-packages\joblib\externals\loky\backend\context.py", line 257, in _count_physical_cores
cpu_info = subprocess.run(
"wmic CPU Get NumberOfCores /Format:csv".split(),
capture_output=True,
text=True,
)
File "C:\Users\harsh\anaconda3\Lib\subprocess.py", line 554, in run
with Popen(*popenargs, **kwargs) as process:
~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\harsh\anaconda3\Lib\subprocess.py", line 1039, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pass_fds, cwd, env,
^^^^^^^^^^^^^^^^^^^
...<5 lines>...
gid, gids, uid, umask,
^^^^^^^^^^^^^^^^^^^^^^
start_new_session, process_group)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\harsh\anaconda3\Lib\subprocess.py", line 1554, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
# no special security
^^^^^^^^^^^^^^^^^^^^^
...<4 lines>...
cwd,
^^^^
startupinfo)
^^^^^^^^^^^^
array([1, 1, 1, 1, 0])
precision recall f1-score support
0 0.53 0.52 0.53 9862
1 0.55 0.56 0.55 10138
accuracy 0.54 20000
macro avg 0.54 0.54 0.54 20000
weighted avg 0.54 0.54 0.54 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
array([1, 1, 0, 0, 0])
precision recall f1-score support
0 0.70 0.70 0.70 39168
1 0.71 0.72 0.72 40832
accuracy 0.71 80000
macro avg 0.71 0.71 0.71 80000
weighted avg 0.71 0.71 0.71 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
array([[0.2, 0.8],
[0.4, 0.6],
[0.4, 0.6],
[0.4, 0.6],
[1. , 0. ]])
array([0.8, 0.6, 0.6, ..., 0.8, 0.8, 0.4])
Model-3-GaussianNB¶
GaussianNB()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GaussianNB()
array([1, 0, 1, 0, 0])
precision recall f1-score support
0 0.59 0.60 0.60 9862
1 0.61 0.59 0.60 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.59 0.60 0.60 39168
1 0.61 0.60 0.60 40832
accuracy 0.60 80000
macro avg 0.60 0.60 0.60 80000
weighted avg 0.60 0.60 0.60 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
array([[0.47449917, 0.52550083],
[0.87242511, 0.12757489],
[0.48429252, 0.51570748],
[0.87412849, 0.12587151],
[0.89743128, 0.10256872]])
array([0.52550083, 0.12757489, 0.51570748, ..., 0.5322441 , 0.30681517,
0.43249505])
Model-4-Decision Tree¶
DecisionTreeClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=42)
array([1, 0, 1, 0, 0])
precision recall f1-score support
0 0.52 0.52 0.52 9862
1 0.53 0.53 0.53 10138
accuracy 0.52 20000
macro avg 0.52 0.52 0.52 20000
weighted avg 0.52 0.52 0.52 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 1.00 1.00 1.00 39168
1 1.00 1.00 1.00 40832
accuracy 1.00 80000
macro avg 1.00 1.00 1.00 80000
weighted avg 1.00 1.00 1.00 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
array([[0., 1.],
[1., 0.],
[0., 1.],
[1., 0.],
[1., 0.]])
array([1., 0., 1., ..., 1., 1., 1.])
Feature Importance-DT¶
array([[0.2, 0.8],
[0.4, 0.6],
[0.4, 0.6],
[0.4, 0.6],
[1. , 0. ]])
Model-5-Decision Tree(FI)¶
NOTE : We have Dropped the Columns -'Course_Level','Discount_Used' Here
DecisionTreeClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=42)
array([0, 0, 1, 0, 0])
precision recall f1-score support
0 0.52 0.53 0.53 9862
1 0.54 0.53 0.53 10138
accuracy 0.53 20000
macro avg 0.53 0.53 0.53 20000
weighted avg 0.53 0.53 0.53 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 1.00 1.00 1.00 39168
1 1.00 1.00 1.00 40832
accuracy 1.00 80000
macro avg 1.00 1.00 1.00 80000
weighted avg 1.00 1.00 1.00 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
array([[1., 0.],
[1., 0.],
[0., 1.],
[1., 0.],
[1., 0.]])
array([0., 0., 1., ..., 0., 1., 1.])
Model-6-Decision Tree(Tuned)¶
{'criterion': 'entropy',
'max_depth': 5,
'max_features': 'sqrt',
'max_leaf_nodes': 8,
'min_samples_leaf': 1,
'min_samples_split': 2}
array([0, 1, 1, 0, 0])
precision recall f1-score support
0 0.53 0.65 0.58 9862
1 0.56 0.44 0.50 10138
accuracy 0.54 20000
macro avg 0.55 0.55 0.54 20000
weighted avg 0.55 0.54 0.54 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
array([1, 1, 1, 1, 0])
precision recall f1-score support
0 0.54 0.66 0.59 39168
1 0.58 0.45 0.51 40832
accuracy 0.55 80000
macro avg 0.56 0.56 0.55 80000
weighted avg 0.56 0.55 0.55 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
array([[0.50520472, 0.49479528],
[0.49824679, 0.50175321],
[0.49824679, 0.50175321],
[0.52053807, 0.47946193],
[0.52053807, 0.47946193]])
array([0.49479528, 0.50175321, 0.50175321, ..., 0.50175321, 0.47946193,
0.47012328])
Model-7-Random Forest¶
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=42)
array([0, 0, 1, 0, 0])
precision recall f1-score support
0 0.58 0.59 0.59 9862
1 0.60 0.59 0.59 10138
accuracy 0.59 20000
macro avg 0.59 0.59 0.59 20000
weighted avg 0.59 0.59 0.59 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 1.00 1.00 1.00 39168
1 1.00 1.00 1.00 40832
accuracy 1.00 80000
macro avg 1.00 1.00 1.00 80000
weighted avg 1.00 1.00 1.00 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
13 Random Forest-Model-7(Train) 1.000 1.000 1.000 1.000
array([[0.58, 0.42],
[0.72, 0.28],
[0.38, 0.62],
[0.67, 0.33],
[0.63, 0.37]])
array([0.42, 0.28, 0.62, ..., 0.53, 0.38, 0.52])
Model-8-Random Forest(Tuned)¶
GridSearchCV(cv=5, estimator=RandomForestClassifier(random_state=42), n_jobs=-1,
param_grid=[{'criterion': ['entropy'], 'max_depth': [10, 12],
'max_features': ['sqrt', 'log2'],
'max_leaf_nodes': [9, 11], 'min_samples_leaf': [5, 7],
'min_samples_split': [2, 4], 'n_estimators': [100]}])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=5, estimator=RandomForestClassifier(random_state=42), n_jobs=-1,
param_grid=[{'criterion': ['entropy'], 'max_depth': [10, 12],
'max_features': ['sqrt', 'log2'],
'max_leaf_nodes': [9, 11], 'min_samples_leaf': [5, 7],
'min_samples_split': [2, 4], 'n_estimators': [100]}])RandomForestClassifier(criterion='entropy', max_depth=10, max_leaf_nodes=11,
min_samples_leaf=5, random_state=42)RandomForestClassifier(criterion='entropy', max_depth=10, max_leaf_nodes=11,
min_samples_leaf=5, random_state=42){'criterion': 'entropy',
'max_depth': 10,
'max_features': 'sqrt',
'max_leaf_nodes': 11,
'min_samples_leaf': 5,
'min_samples_split': 2,
'n_estimators': 100}
array([1, 0, 1, 0, 0])
precision recall f1-score support
0 0.59 0.56 0.58 9862
1 0.60 0.63 0.61 10138
accuracy 0.59 20000
macro avg 0.59 0.59 0.59 20000
weighted avg 0.59 0.59 0.59 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
13 Random Forest-Model-7(Train) 1.000 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626 0.611
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.59 0.56 0.58 39168
1 0.60 0.63 0.62 40832
accuracy 0.60 80000
macro avg 0.60 0.60 0.60 80000
weighted avg 0.60 0.60 0.60 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
13 Random Forest-Model-7(Train) 1.000 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626 0.611
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632 0.616
array([[0.49552448, 0.50447552],
[0.58604652, 0.41395348],
[0.440965 , 0.559035 ],
[0.56627823, 0.43372177],
[0.59255747, 0.40744253]])
array([0.50447552, 0.41395348, 0.559035 , ..., 0.48969224, 0.4456162 ,
0.52082183])
Model-9-AdaBoostClassifier¶
AdaBoostClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AdaBoostClassifier(random_state=42)
array([0, 0, 1, 0, 0])
precision recall f1-score support
0 0.60 0.58 0.59 9862
1 0.60 0.62 0.61 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
13 Random Forest-Model-7(Train) 1.000 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626 0.611
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632 0.616
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618 0.611
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.60 0.59 0.59 39168
1 0.61 0.62 0.62 40832
accuracy 0.61 80000
macro avg 0.61 0.61 0.61 80000
weighted avg 0.61 0.61 0.61 80000
Name Accuracy Precision Recall F1 Score
0 Logit-Model-1(Test) 0.602 0.604 0.620 0.612
1 Logit-Model-1(Train) 0.608 0.614 0.629 0.621
2 KNN-Model-2(Test) 0.540 0.545 0.559 0.552
3 KNN-Model-2(Train) 0.709 0.714 0.718 0.716
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591 0.598
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597 0.603
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525 0.528
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531 0.534
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444 0.497
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452 0.508
12 Random Forest-Model-7(Test) 0.589 0.596 0.588 0.592
13 Random Forest-Model-7(Train) 1.000 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626 0.611
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632 0.616
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618 0.611
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622 0.617
array([[0.58217837, 0.41782163],
[0.62052135, 0.37947865],
[0.41944189, 0.58055811],
[0.60542716, 0.39457284],
[0.64281061, 0.35718939]])
array([0.41782163, 0.37947865, 0.58055811, ..., 0.48325558, 0.45416274,
0.51554871])
Model-10-AdaBoostClassifier(Tuned)¶
{'learning_rate': 0.1, 'n_estimators': 200}
AdaBoostClassifier(learning_rate=0.1, n_estimators=200, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AdaBoostClassifier(learning_rate=0.1, n_estimators=200, random_state=42)
array([0, 0, 1, 0, 0])
precision recall f1-score support
0 0.60 0.58 0.59 9862
1 0.60 0.61 0.61 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.60 0.59 0.59 39168
1 0.61 0.62 0.62 40832
accuracy 0.61 80000
macro avg 0.61 0.61 0.61 80000
weighted avg 0.61 0.61 0.61 80000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
array([[0.56387764, 0.43612236],
[0.6892872 , 0.3107128 ],
[0.34625045, 0.65374955],
[0.69268473, 0.30731527],
[0.70780048, 0.29219952]])
array([0.43612236, 0.3107128 , 0.65374955, ..., 0.45255592, 0.42041945,
0.50738417])
Model-11-XGBClassifier¶
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
feature_weights=None, gamma=None, grow_policy=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_bin=None, max_cat_threshold=None,
max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
max_leaves=None, min_child_weight=None, missing=nan,
monotone_constraints=None, multi_strategy=None, n_estimators=None,
n_jobs=None, num_parallel_tree=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
feature_weights=None, gamma=None, grow_policy=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_bin=None, max_cat_threshold=None,
max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
max_leaves=None, min_child_weight=None, missing=nan,
monotone_constraints=None, multi_strategy=None, n_estimators=None,
n_jobs=None, num_parallel_tree=None, ...)array([0, 0, 1, 0, 1])
precision recall f1-score support
0 0.58 0.57 0.57 9862
1 0.59 0.60 0.59 10138
accuracy 0.58 20000
macro avg 0.58 0.58 0.58 20000
weighted avg 0.58 0.58 0.58 20000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.71 0.70 0.70 39168
1 0.71 0.73 0.72 40832
accuracy 0.71 80000
macro avg 0.71 0.71 0.71 80000
weighted avg 0.71 0.71 0.71 80000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
21 XGBClassifier-Model-11(Train) 0.713 0.715 0.730
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
21 0.722
array([[0.6249052 , 0.3750948 ],
[0.7202593 , 0.2797407 ],
[0.29613912, 0.7038609 ],
[0.5834572 , 0.41654283],
[0.46869588, 0.5313041 ]], dtype=float32)
array([0.3750948, 0.2797407, 0.7038609, ..., 0.5532377, 0.5157955,
0.397631 ], dtype=float32)
Model-12-XGBClassifier(Tuned)¶
Fitting 5 folds for each of 32 candidates, totalling 160 fits
GridSearchCV(cv=5,
estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=False,
eval_metric='logloss', feature_types=None,
feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraint...
max_delta_step=None, max_depth=None,
max_leaves=None, min_child_weight=None,
missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None,
n_jobs=None, num_parallel_tree=None, ...),
n_jobs=-1,
param_grid={'colsample_bytree': [0.8, 1.0],
'learning_rate': [0.05, 0.1], 'max_depth': [3, 5],
'n_estimators': [100, 200], 'subsample': [0.8, 1.0]},
scoring='accuracy', verbose=2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=5,
estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=False,
eval_metric='logloss', feature_types=None,
feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraint...
max_delta_step=None, max_depth=None,
max_leaves=None, min_child_weight=None,
missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None,
n_jobs=None, num_parallel_tree=None, ...),
n_jobs=-1,
param_grid={'colsample_bytree': [0.8, 1.0],
'learning_rate': [0.05, 0.1], 'max_depth': [3, 5],
'n_estimators': [100, 200], 'subsample': [0.8, 1.0]},
scoring='accuracy', verbose=2)XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.8, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric='logloss',
feature_types=None, feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=3, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=100, n_jobs=None,
num_parallel_tree=None, ...)XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.8, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric='logloss',
feature_types=None, feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=3, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=100, n_jobs=None,
num_parallel_tree=None, ...){'colsample_bytree': 0.8,
'learning_rate': 0.1,
'max_depth': 3,
'n_estimators': 100,
'subsample': 1.0}
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.8, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric='logloss',
feature_types=None, feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=3, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=100, n_jobs=None,
num_parallel_tree=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.8, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric='logloss',
feature_types=None, feature_weights=None, gamma=None,
grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=3, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=100, n_jobs=None,
num_parallel_tree=None, ...)array([0, 0, 1, 0, 0])
precision recall f1-score support
0 0.60 0.59 0.59 9862
1 0.60 0.61 0.61 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
21 XGBClassifier-Model-11(Train) 0.713 0.715 0.730
22 XGBClassifier(Tuned)-Model-12(Test) 0.601 0.605 0.613
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
21 0.722
22 0.609
array([1, 1, 0, 1, 1])
precision recall f1-score support
0 0.61 0.60 0.60 39168
1 0.62 0.63 0.62 40832
accuracy 0.61 80000
macro avg 0.61 0.61 0.61 80000
weighted avg 0.61 0.61 0.61 80000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
21 XGBClassifier-Model-11(Train) 0.713 0.715 0.730
22 XGBClassifier(Tuned)-Model-12(Test) 0.601 0.605 0.613
23 XGBClassifier(Tuned)-Model-12(Train) 0.613 0.619 0.628
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
21 0.722
22 0.609
23 0.623
array([[0.55821514, 0.4417849 ],
[0.6628357 , 0.3371643 ],
[0.3925438 , 0.6074562 ],
[0.6607305 , 0.33926955],
[0.6040486 , 0.39595136]], dtype=float32)
array([0.4417849 , 0.3371643 , 0.6074562 , ..., 0.5285576 , 0.38694397,
0.5013471 ], dtype=float32)
PCA¶
| Gender | Age | Education_Level | Employment_Status | City | Device_Type | Internet_Connection_Quality | Course_Name | Category | Course_Level | Course_Duration_Days | Instructor_Rating | Login_Frequency | Average_Session_Duration_Min | Video_Completion_Rate | Discussion_Participation | Time_Spent_Hours | Days_Since_Last_Login | Notifications_Checked | Peer_Interaction_Score | Assignments_Submitted | Assignments_Missed | Quiz_Attempts | Quiz_Score_Avg | Project_Grade | Progress_Percentage | Rewatch_Count | Payment_Mode | Fee_Paid | Discount_Used | Payment_Amount | App_Usage_Percentage | Reminder_Emails_Clicked | Support_Tickets_Raised | Satisfaction_Rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | -0.750 | 3.0 | 2 | 6 | 0 | 1.0 | 0 | 4 | 1.0 | 0.500000 | 0.666667 | -0.666667 | -0.285714 | -0.310345 | 0.0 | -0.385965 | -0.375 | 0.333333 | -0.714286 | 1.5 | -2.0 | 0.333333 | 0.439306 | 0.137441 | 0.976744 | -1.0 | 4 | 0 | 0 | -0.573628 | -0.703704 | 0.5 | 3.0 | -0.636364 |
| 1 | 0 | -1.000 | 2.0 | 2 | 4 | 0 | 2.0 | 4 | 4 | 0.0 | 1.500000 | 0.333333 | -0.333333 | 0.214286 | 0.693103 | 0.0 | -0.315789 | -0.125 | 0.000000 | 0.535714 | -0.5 | 0.5 | -0.333333 | 0.294798 | -1.222749 | 0.093023 | 0.0 | 0 | 1 | 0 | 0.706361 | 0.666667 | -1.0 | -1.0 | 0.272727 |
| 2 | 0 | 1.125 | 1.0 | 2 | 3 | 1 | 1.0 | 5 | 4 | 2.0 | 0.000000 | 0.333333 | 0.000000 | -1.785714 | 0.400000 | 0.5 | -0.385965 | 1.875 | 0.000000 | 0.142857 | 1.5 | -1.5 | -0.333333 | 1.543353 | 0.928910 | 1.441860 | 0.0 | 3 | 1 | 0 | 0.164101 | 0.629630 | -0.5 | -1.0 | 0.727273 |
| 3 | 0 | 0.500 | 3.0 | 0 | 13 | 1 | 0.0 | 7 | 1 | 2.0 | -0.166667 | -0.333333 | -1.000000 | -0.500000 | -0.024138 | -0.5 | 0.824561 | 1.875 | 1.333333 | 0.035714 | -2.5 | 2.5 | 0.000000 | -0.820809 | -0.800948 | -1.703488 | 1.0 | 5 | 1 | 0 | 0.028173 | -0.962963 | 0.0 | 2.0 | -0.363636 |
| 4 | 0 | -0.750 | 1.0 | 1 | 9 | 0 | 1.0 | 4 | 4 | 0.0 | 1.500000 | 0.333333 | -1.000000 | 0.142857 | 0.772414 | -0.5 | -0.385965 | 0.000 | 0.666667 | 0.428571 | 0.0 | 0.0 | 1.333333 | 0.664740 | 1.170616 | 0.633721 | 1.0 | 1 | 1 | 1 | 0.514377 | 0.851852 | 0.5 | -1.0 | -0.181818 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99995 | 0 | 1.125 | 2.0 | 2 | 5 | 1 | 1.0 | 1 | 2 | 2.0 | -0.500000 | -0.666667 | 0.000000 | 1.000000 | 0.972414 | 1.0 | 0.877193 | 0.250 | 0.000000 | -0.714286 | 0.5 | -1.0 | -0.666667 | 0.589595 | 0.274882 | 1.046512 | 0.5 | 5 | 1 | 0 | -0.124891 | 1.185185 | 0.5 | 1.0 | -0.363636 |
| 99996 | 0 | -0.125 | 2.0 | 1 | 4 | 1 | 0.0 | 1 | 2 | 2.0 | -0.500000 | -0.666667 | 0.333333 | -0.142857 | 0.575862 | 1.5 | -0.385965 | 0.625 | -1.000000 | 0.750000 | 0.0 | 0.0 | 0.666667 | -0.086705 | 1.009479 | 0.505814 | -0.5 | 5 | 1 | 0 | -0.139994 | 0.111111 | 0.0 | -1.0 | -1.090909 |
| 99997 | 1 | -1.000 | 1.0 | 0 | 0 | 0 | 1.0 | 6 | 3 | 1.0 | 0.166667 | -1.000000 | -0.333333 | -0.071429 | 0.051724 | -1.0 | 0.543860 | 1.500 | -0.333333 | 0.357143 | 0.0 | 0.0 | -0.333333 | -0.971098 | 0.767773 | 0.180233 | 0.5 | 5 | 1 | 1 | 0.030497 | -0.259259 | 0.0 | -1.0 | -0.090909 |
| 99998 | 0 | 0.375 | 2.0 | 2 | 0 | 1 | 0.0 | 1 | 2 | 2.0 | -0.500000 | -0.666667 | 0.333333 | -0.785714 | -0.844828 | -0.5 | 0.526316 | -0.125 | 0.000000 | -0.321429 | 0.0 | 0.0 | 2.000000 | -0.248555 | 0.161137 | -0.383721 | 1.0 | 0 | 1 | 0 | -0.106884 | 0.370370 | -0.5 | -1.0 | 0.454545 |
| 99999 | 1 | 0.500 | 3.0 | 2 | 1 | 0 | 2.0 | 7 | 1 | 2.0 | -0.166667 | -0.333333 | 0.000000 | 1.214286 | -0.296552 | -1.0 | 0.070175 | 0.750 | 1.333333 | 0.321429 | 1.5 | -1.5 | 0.000000 | -0.578035 | 0.483412 | 1.005814 | 1.0 | 0 | 1 | 0 | 0.117630 | -0.481481 | -0.5 | -1.0 | -0.272727 |
100000 rows × 35 columns
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1.003730 | -3.320966 | 1.234242 | 1.227996 | -2.046160 | -0.954942 | 3.557567 | 0.279726 | -1.167832 | -0.542033 | -0.108152 | 0.958137 | -0.644089 | -0.655836 | 0.519336 | -0.701383 | -0.409125 | -0.762163 | -0.315608 |
| 1 | -2.994903 | 0.657030 | -2.750106 | 2.027313 | 0.273107 | 0.912761 | -0.142815 | 0.226739 | 0.753094 | -0.947799 | 0.760818 | -0.486261 | 0.868133 | 0.751500 | -0.563946 | -0.210647 | 0.218324 | 1.317837 | 1.168343 |
| 2 | -4.009643 | 1.711504 | 0.240452 | 0.990107 | -2.684743 | -0.218804 | 0.253286 | -0.614127 | 2.193415 | 0.500803 | 0.371842 | 0.241922 | 0.659093 | 0.781887 | 1.504024 | 0.605035 | -1.495029 | -0.112257 | 0.068494 |
| 3 | 5.994561 | 3.840514 | 2.244081 | -1.774813 | 4.140742 | -0.296129 | 1.028160 | 0.415821 | -1.031120 | 2.199874 | 1.083793 | 0.687537 | 0.473026 | 0.122322 | -0.449173 | 0.496109 | 1.192114 | -0.414756 | 0.465401 |
| 4 | 1.998731 | 0.671759 | -1.740147 | 2.009950 | -1.108059 | 0.355780 | -1.273395 | -0.888741 | -0.169095 | 0.258297 | 0.233948 | 0.489905 | 1.303961 | 0.811921 | -0.540801 | 1.142155 | -0.653207 | 0.223974 | -0.277023 |
Train-Test-split-PCA
Optimization terminated successfully.
Current function value: 0.657397
Iterations 5
Logit Regression Results
==============================================================================
Dep. Variable: Completed No. Observations: 80000
Model: Logit Df Residuals: 79965
Method: MLE Df Model: 34
Date: Mon, 05 Jan 2026 Pseudo R-squ.: 0.05128
Time: 11:21:35 Log-Likelihood: -52592.
converged: True LL-Null: -55434.
Covariance Type: nonrobust LLR p-value: 0.000
================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------
Gender 0.0272 0.014 2.007 0.045 0.001 0.054
Age 0.0067 0.011 0.609 0.542 -0.015 0.028
Education_Level 0.0083 0.008 1.059 0.290 -0.007 0.024
Employment_Status 0.0093 0.007 1.311 0.190 -0.005 0.023
City 0.0015 0.002 0.918 0.359 -0.002 0.005
Device_Type -0.0018 0.013 -0.139 0.889 -0.027 0.024
Internet_Connection_Quality 0.0094 0.011 0.867 0.386 -0.012 0.031
Course_Name -0.0020 0.004 -0.548 0.584 -0.009 0.005
Category 0.0065 0.010 0.678 0.498 -0.012 0.025
Course_Level 0.0355 0.026 1.371 0.170 -0.015 0.086
Course_Duration_Days 0.0927 0.032 2.858 0.004 0.029 0.156
Instructor_Rating -0.0092 0.019 -0.482 0.629 -0.046 0.028
Login_Frequency 0.0132 0.013 1.053 0.292 -0.011 0.038
Average_Session_Duration_Min -0.0053 0.010 -0.520 0.603 -0.025 0.015
Video_Completion_Rate -0.2981 0.045 -6.598 0.000 -0.387 -0.210
Discussion_Participation -0.0045 0.010 -0.463 0.643 -0.024 0.015
Time_Spent_Hours -0.2709 0.012 -23.544 0.000 -0.293 -0.248
Days_Since_Last_Login 0.0765 0.009 8.950 0.000 0.060 0.093
Notifications_Checked -0.0030 0.009 -0.323 0.747 -0.021 0.015
Peer_Interaction_Score -0.0100 0.011 -0.955 0.340 -0.031 0.011
Assignments_Submitted -0.0191 0.034 -0.566 0.571 -0.085 0.047
Assignments_Missed 0.0374 0.045 0.838 0.402 -0.050 0.125
Quiz_Attempts 0.0244 0.011 2.201 0.028 0.003 0.046
Quiz_Score_Avg -0.1569 0.010 -15.026 0.000 -0.177 -0.136
Project_Grade 0.0043 0.011 0.380 0.704 -0.018 0.026
Progress_Percentage -0.3835 0.074 -5.171 0.000 -0.529 -0.238
Rewatch_Count -0.0083 0.010 -0.826 0.409 -0.028 0.011
Payment_Mode -0.0016 0.004 -0.419 0.676 -0.009 0.006
Fee_Paid -0.1507 0.043 -3.505 0.000 -0.235 -0.066
Discount_Used -0.0374 0.019 -1.957 0.050 -0.075 6.59e-05
Payment_Amount -0.2184 0.035 -6.159 0.000 -0.288 -0.149
App_Usage_Percentage -0.0284 0.010 -2.726 0.006 -0.049 -0.008
Reminder_Emails_Clicked 0.0091 0.009 0.966 0.334 -0.009 0.028
Support_Tickets_Raised -0.0048 0.008 -0.622 0.534 -0.020 0.010
Satisfaction_Rating 0.0177 0.012 1.505 0.132 -0.005 0.041
================================================================================================
75721 0.421062 80184 0.324220 19864 0.637756 76699 0.310405 92991 0.313945 dtype: float64
[0, 0, 1, 0, 0]
precision recall f1-score support
0 0.60 0.58 0.59 9862
1 0.61 0.62 0.61 10138
accuracy 0.60 20000
macro avg 0.60 0.60 0.60 20000
weighted avg 0.60 0.60 0.60 20000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
21 XGBClassifier-Model-11(Train) 0.713 0.715 0.730
22 XGBClassifier(Tuned)-Model-12(Test) 0.601 0.605 0.613
23 XGBClassifier(Tuned)-Model-12(Train) 0.613 0.619 0.628
24 PCA-Logit-Model-13 0.602 0.605 0.619
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
21 0.722
22 0.609
23 0.623
24 0.612
DT-14
DecisionTreeClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=42)
array([1, 0, 1, 0, 0])
precision recall f1-score support
0 0.52 0.52 0.52 9862
1 0.53 0.53 0.53 10138
accuracy 0.52 20000
macro avg 0.52 0.52 0.52 20000
weighted avg 0.52 0.52 0.52 20000
Name Accuracy Precision Recall \
0 Logit-Model-1(Test) 0.602 0.604 0.620
1 Logit-Model-1(Train) 0.608 0.614 0.629
2 KNN-Model-2(Test) 0.540 0.545 0.559
3 KNN-Model-2(Train) 0.709 0.714 0.718
4 GaussianNB-Model-3(Test) 0.598 0.606 0.591
5 GaussianNB-Model-3(Train) 0.599 0.610 0.597
6 Decision Tree-Model-4(Test) 0.523 0.530 0.525
7 Decision Tree-Model-4(Train) 1.000 1.000 1.000
8 Decision Tree(FI)-Model-5(Test) 0.530 0.537 0.531
9 Decision Tree(FI)-Model-5(Train) 1.000 1.000 1.000
10 Decision Tree(Tuned)-Model-6(Test) 0.544 0.564 0.444
11 Decision Tree(Tuned)-Model-6(Train) 0.553 0.580 0.452
12 Random Forest-Model-7(Test) 0.589 0.596 0.588
13 Random Forest-Model-7(Train) 1.000 1.000 1.000
14 Random Forest(Tuned)-Model-8(Test) 0.595 0.595 0.626
15 Random Forest(Tuned)-Model-8(Train) 0.598 0.601 0.632
16 AdaBoostClassifier-Model-9(Test) 0.601 0.604 0.618
17 AdaBoostClassifier-Model-9(Train) 0.606 0.612 0.622
18 AdaBoostClassifier(Tuned)-Model-10(Test) 0.600 0.603 0.615
19 AdaBoostClassifier(Tuned)-Model-10(Train) 0.607 0.613 0.624
20 XGBClassifier-Model-11(Test) 0.583 0.588 0.598
21 XGBClassifier-Model-11(Train) 0.713 0.715 0.730
22 XGBClassifier(Tuned)-Model-12(Test) 0.601 0.605 0.613
23 XGBClassifier(Tuned)-Model-12(Train) 0.613 0.619 0.628
24 PCA-Logit-Model-13 0.602 0.605 0.619
25 PCA-Decision Tree-Model-14 0.523 0.530 0.525
F1 Score
0 0.612
1 0.621
2 0.552
3 0.716
4 0.598
5 0.603
6 0.528
7 1.000
8 0.534
9 1.000
10 0.497
11 0.508
12 0.592
13 1.000
14 0.611
15 0.616
16 0.611
17 0.617
18 0.609
19 0.618
20 0.593
21 0.722
22 0.609
23 0.623
24 0.612
25 0.528